How much time and money can AI save government? Cognitive technologies could free up hundreds of millions of public sector worker hours
- Introduction: AI-based technology brings both optimism and anxiety
- Breaking government work into tasks to clarify AI’s effects
- What do government workers do all day?
- Activities most likely to be automated
- AI shows enormous potential for labor time savings
- Conclusion: Minimizing disruption and enabling innovation
- Appendix: Data and methods
Farewell, paperwork? New cognitive applications could well mean government doing more with less—less work, that is, not necessarily fewer workers. Depending on the level of investment, AI technology could allow people to spend fewer hours on noncore tasks and more on client service and creative work.
Introduction: AI-based technology brings both optimism and anxiety
All kinds of institutions today run on data, and that means endless staff hours spent inputting, processing, and communicating. The work needs to get done, so someone has to spend that time pecking away at a keyboard, right?
The promise of reducing—or even eliminating—all that drudge work is one reason why many managers are enthusiastic about new applications based on artificial intelligence (AI). Finally, staff resources could be freed up to do real work, with people having time to focus on creative projects and deal directly with clients and customers.
But of course, there’s no guarantee that any new labor-saving technology will make everyone’s daily lives more rewarding rather than simply wiping out entire categories of employment.1 And that’s why AI applications make plenty of people anxious as well, especially since cognitive technologies are increasingly capable of carrying out tasks once reserved for knowledge workers.2
Technology, from farm equipment to factory robots to voice mail, has always displaced low-skilled workers. But only recently has it threatened white-collar professionals’ positions: Computer scientists are building machines capable of carrying out almost any task, even those—such as composing music—seemingly at the core of our humanity.3 Knowledge workers, whose jobs once seemed secure, are feeling directly threatened for the first time.
Visit the AI in Government collection
Read the full article
Read the executive summary
Listen to the podcast
Watch the video
So there’s a blend of anticipation and dread within a wide range of organizations and industries—and public-sector agencies are no exception.4 Conversations with government executives suggest that most lack a clear vision of how AI applications might affect their staff and missions, which is understandable, since prior research hardly offers an actionable forecast. The US Bureau of Labor Statistics optimistically predicts that government workforces will see almost no job losses between now and 2024,5 while a recent study by Deloitte-UK and Oxford University suggests that up to 18 percent of UK public-sector jobs could be automated by 2030.6
We’ve attempted to bring clarity to the confusion, for agency chiefs looking to future workforce needs. Our view is that the key to planning ahead is understanding how much time cognitive technologies could save. And indeed, our research, based on a new method for studying AI-based technology’s effects on government workforces, indicates that cognitive technologies could free up large numbers of labor hours by automating certain tasks and allowing managers to shift employees to tasks requiring human judgment.
These new applications could save hundreds of millions of staff hours and billions of dollars annually. But the shift’s size and impact will depend on many factors, some political and some financial. With adequate investment and support, we believe, AI could free up 30 percent of the government workforce’s time within five to seven years. Lower levels of investment and support would yield lower savings, of course: Minimal investment in AI would result in savings of just 2 to 4 percent of total labor time.
Breaking government work into tasks to clarify AI’s effects
You need to know where you are before you can decide where you’re going; this truism certainly applies to predicting the effects of AI on government work. Most existing quantitative models begin by tallying workers by occupation and predicting which jobs will be replaced by technology. In other words, they rely on occupations as the unit of analysis.7
But we know from a long history with these issues that technology typically doesn’t replace jobs wholesale, at least at first.8 Instead, it often substitutes for specific tasks, while the workers who previously performed them shift to jobs complementary to the new technology. Over time, technology often results in a complete rethinking of what organizations produce and what the goal of that production is. Recent history shows this pattern has also been true for government work (see sidebar, “How cartography went digital”).
How cartography went digital
The US Geological Survey (USGS) began producing topographic maps of the nation in 1879,9 and for most of its history, it printed its maps on paper. If you were an active hiker or camper in the 1980s, you’ll likely remember shelves and shelves of USGS topo maps at outdoor stores, but over the following decade, USGS transformed its mapmaking techniques by embracing digital map production. This transformation, which relied on a major Reagan-era investment in geospatial information systems technology, was disruptive and productive. It significantly improved the efficiency of production—and completely changed the nature of cartographers’ jobs.10
Before the transformation, USGS cartographers worked as skilled craftsmen, performing painstaking tasks such as drawing elevation contours on acetate sheets. Today, their duties primarily involve collecting and disseminating digital cartographic data through the National Map program.11
Today, USGS officials recall a bumpy transformation. Veteran cartographer Laurence Moore says, “We were slow to appreciate how fundamentally GPS and digital map data would change the world, and tended to think of these technologies as just tools to produce traditional maps faster and cheaper.”
Today, the agency employs only a tenth of the cartographers working there at the peak of the paper-map production era. But paradoxically, the total number of cartographers and photogrammetrists employed by federal, state, and local governments has risen by 84 percent since 1999.12 And the Bureau of Labor Statistics forecasts a 29 percent growth in employment for cartographers and photogrammetrists through 2024, largely due to “increasing use of maps for government planning.”13
Deloitte has developed a new methodology for measuring the amount of time government workers spend on the tasks that fill up their work days. We believe we’re the first to quantify government work at the task level. The appendix explains details of our method.
For this article, we’ve applied this method to the federal civilian workforce and to the workforce of a large, representative Midwestern state (figure 1). The state was chosen due to the similarity of its workforce to many other state governments and because it provides detailed open workforce data through cutting-edge transparency. We expect patterns we find in this state to be broadly applicable to a number of others.
What do government workers do all day?
So how do government workers spend their time? We estimate that the two workforces collectively work 4.3 billion (federal) and 108 million (state government) hours a year. We group the tasks they perform into “generalized work activities,” using the US Department of Labor’s (DOL’s) O*NET activity framework.14
For both federal and state workers, by far the most time-consuming activity is documenting and recording information, a task capturing 10 percent of both federal and state government work hours. And while a few workers undoubtedly love documentation for its own sake, for most this activity surely isn’t the most rewarding part of the day.
Few observers will be surprised to find that paperwork can get in the way of government workers’ more critical functions15—just think of, for instance, all the times you’ve seen TV police officers groan over having to write and file lengthy reports. But the amount of time devoted to seemingly peripheral activities is sobering.
A quick glance at figures 2, unsurprisingly, shows several tasks that might be highly amenable to automation. Now consider figure 3: the five most labor-intensive activities performed by the federal workforce, and their suitability for automation.
AI-based applications can almost certainly improve some activities, such as filling out forms or moving objects. For others, such as caring for patients, cognitive technologies aren’t ready to replace people. (The appendix describes how we rank activities for their automation potential.)
Government employees spend a day a week on “supplemental” tasks
We estimate that federal and state workers spend at least 20 percent of their time on tasks they consider unimportant (figure 4).16 It’s a low-end estimate based on the DOL’s restrictive definition of “supplemental” tasks. If you asked government workers directly, they might give you a much higher figure.
Activities most likely to be automated
“Supplemental tasks” is a very broad description, of course, and can mean different things in different contexts. As agency executives consider incorporating AI-based technology into their work, where should they begin?
Just because a task can be automated doesn’t mean it will or should be anytime soon. Several factors tend to influence which tasks are both most conducive to automation and most likely to be automated. We’ve identified these from our research on 13-year trends in work activities as well as the widely accepted findings of labor market economists.
The factors are task importance, skill requirements, work volume, and technological barriers. We examine each below.
1. Peripheral tasks
It would be logical to assume that industries would automate their most important tasks first, to gain the maximum benefit from technology’s cost-effectiveness and reliability. The opposite is often true, however—automation usually begins with unimportant tasks or, at least, those perceived as unimportant.
The same is true for work activities. We studied 13 years of changes to the length of time spent on individual tasks, using data from the DOL O*NET database. Over the study period, tasks considered less important consumed less and less time, implying some degree of technological substitution (figure 5).
In our data set, tasks with above-average importance gained labor inputs by 4.6 percent, while tasks with below-average importance lost labor inputs by 1.3 percent. A task’s importance correlated positively and significantly (rho = .09, p < .0001) with a change in the amount of time spent on it.
Thus, we can comfortably expect that agencies will look to begin integration of AI-based technology with tasks considered less important.
2. Middle-skilled tasks
A task’s skill requirements also affect its likelihood of automation. In employment settings, “middle-level” skills generally refer to positions requiring education beyond high school but less than a four-year college degree. More broadly, one author has defined middle-level tasks as “cognitive or manual in nature and requir[ing] one to follow precise procedures.”17 In government, various clerking positions provide good examples.
In the future tasks requiring middle-skill levels will likely be automated sooner, on average, than both high- and low-skill tasks. Many low-skilled tasks have already been replaced by previous waves of automation, and those yet to be automated may pose some barrier to automation (such as requiring a worker to navigate an unpredictable physical environment), or wages may be so low as not to justify investing in automation technology.18
It may seem counterintuitive, but this tendency to hollow out the middle of the labor market first is a well-known characteristic of technological change. Multiple studies have demonstrated how well it explains historical trends in employment and wages.19 These tasks are the easiest targets for technological replacement because enough people perform them (providing enough “volume”) and the wages paid are high enough to justify investing in the technology.
American labor market economists usually highlight skills-biased technological change by showing that employment for high-skilled workers has risen rapidly over time, while the middle-skilled workforce has shed jobs.20 And as middle-skilled workers lose jobs, they’re forced to compete for lower-skilled jobs, driving down wages.
Employment trends in government jobs follow the pattern you’d expect for skills-biased technological change. In the past decade, middle-skill government employment fell while high-skilled employment rose (figure 6).
Figure 6 shows 10 years of federal jobs data broken into five skill levels, using the DOL’s formula for “job zones.” The share of federal workers in higher-skilled jobs (job zones 4 and 5) rose in every year of the study period, while middle-skill employment (zones 2 and 3) shrank. Many of the jobs lost in government were positions such as clerks or administrative professionals. Though considered white -collar work, the tasks involved were routine enough to allow them to be automated by what Tom Davenport and Julia Kirby call the second era of automation—when computers take over the “dull jobs.”21
Since overall government employment trends follow skills-biased trends, we expect similar trends at the task level, determining which government tasks will be replaced sooner than others.22 (See sidebar “Automating middle-skill tasks to speed up circuit testing” for a discussion of how Army Research Labs is automating middle-skill tasks to free up scientists for higher-level work.)
Automating middle-skill tasks to speed up testing
In 2016, the US Army Research Labs (ARL) automated testing of electronic silicon wafers used in military radios and cellphones (figure 7). Testing circuits is critical to making sure that soldiers’ communication equipment functions properly, but the testing process was time-consuming and dull, requiring mid-level skills (such as those possessed by engineering graduate students) and painstaking attention to detail. Testing was viewed as a bottleneck in the production process, and delays encouraged ARL to automate the testing tasks.23
ARL developed an automated probe that can test the circuits imprinted on the wafers, freeing up the engineers to focus on core responsibilities. “Those core responsibilities such as forming hypotheses and designing experiments to test them, or designing systems using input from the data analysis, are much more difficult to perform and require high skill and creative intelligence,” in the words of ARL scientist Ryan Rudy. Automation has sped up testing time by a multiple of 60. Previously, an ARL intern might test 10 percent of one silicon wafer in three months; after automation, an entire wafer can be tested in two weeks.
3. High-volume tasks
A third factor determining where AI investments may be most effective is volume of business. Decades of economic research support the idea that industries with more business volume are better able to invest in expensive labor-saving technologies.24
The volume concept can help guide government executives in targeting AI investments. Since we can break government work into activities and estimate how many hours are spent on each, we can identify time-consuming tasks with high potential for automation—a useful tool for government agencies directing precious investment funds.
Figure 8 ranks state government occupations: on the horizontal axis by the automation potential of their associated tasks (low ranks being easier to automate) and on the vertical axis by employment.
Activities performed by the occupations that figure 8 shows in green (such as data entry workers) could be good starting points for AI investment.
4. Special skill requirements prevent some tasks from automation—for now
The fourth factor is the type of skill required to complete the task in question. Oxford economists Carl Frey and Michael Osborne have identified three types of “intelligence” as current challenges to AI: social intelligence, creative intelligence, and perception and manipulation. In their analysis, social intelligence tasks comprise those requiring traditionally human traits: “negotiation, persuasion and care.” Creative intelligence involves the basic human ability to generate ideas and things that are novel and interesting, whether a theory or a recipe. Perception and manipulation tasks use our ability to comprehend and interact with the chaotic patterns of real life—the irregular, object-filled worlds of airports, supermarkets, and our own homes.25
These are the tasks that will be more difficult—though not necessarily impossible—to hand over to AI technology. For now, “cognitive collaboration” between humans and machines will likely be the most efficient way of carrying out such tasks.26
We employ the Oxford study’s occupational criteria to identify jobs requiring social intelligence, creative intelligence, or perception and manipulation. Jobs requiring any of these show a lower degree of automation in a sample of 964 occupations from the O*NET database (figure 9).27
Figure 9 shows the relation of social intelligence, creative intelligence, and perception/manipulation with automation at the occupational level. We expect the same characteristics to constrain AI development at the task level as well.28
AI shows enormous potential for labor time savings
Decisions concerning how to invest in cognitive technology, and how much, could have major implications for government efficiency and effectiveness. Our research quantifies the likely upper and lower bounds of these effects over the next five to seven years. We don’t use predictive analytics to model these scenarios because cognitive technology is changing so fast that extrapolations are likely to fail. Only 12 years ago, for example, MIT researchers confidently predicted that AI would never replace human drivers.29
Instead, we use Monte Carlo simulation—a method for modeling the probability of different outcomes—to describe three different scenarios for the likely near-term effects of automation on government work.30 For each, we select the base mean of the change in labor inputs to each government task and adjust it according to intrinsic task characteristics. We then simulate changes to task labor inputs by sampling from the normal distribution using the adjusted mean, with standard deviation chosen using O*NET values (figure 10).
Given low, medium, and high levels of government resourcing and investment in AI, our simulations generate the scenarios shown in figure 11.
Figure 11 shows that even low levels of effort behind AI adoption could save government workforces between 2 to 4 percent of all their labor hours. With middling investment levels, much bigger savings become possible. The midrange scenario, which we consider realistic based on our experience with public- and private-sector automation projects, indicates savings of 13 to 15 percent in time requirements within five to seven years. Finally, with strong support for AI adoption, we can simulate a ceiling of potential benefits: 27 to 30 percent time savings within five to seven years. Since IT costs continue to plummet and cognitive technologies are developing rapidly, even the high-end scenario may be within reach.
Conclusion: Minimizing disruption and enabling innovation
Experience teaches us that AI, like other forms of technology, will likely cause disruption among government workers whose jobs it changes. But agency heads can take steps in advance to minimize the effects.31
First, agencies should provide maximum advance notice of plans to replace or augment certain tasks with AI-based applications. Good communication lowers employee stress levels as they undergo technological transformation.32
Second, agency technology leaders should coordinate with human-capital planners to synchronize their upgrades with workforce trends. For example, if an agency anticipates a high rate of retirement within a given occupation, it might prioritize AI investments in that area.33
Third, HR executives can cushion the effects of disruption by encouraging employees to develop new skills. Government might create program offices to oversee curricula and learning incentives relevant to cognitive technologies. Such programs could boost the skills of wide swathes of government employees. Just as foreign-language program offices boosted government skills in mission-critical languages such as Farsi and Arabic in the last two decades, AI training offices could promote targeted curricula and incentives for data analytics, machine learning, and designing human-to-machines interfaces. More broadly, government organizations can improve their training for human skills that are most likely to complement AI-based technology in the long run: problem solving, social intelligence, and creativity.34
Finally, after the IT department installs AI applications, the technology doesn’t run itself; often, maintaining it requires a surprising amount of human labor. Asking software vendors to design training, tuning, and maintenance interfaces for their AI products would help ensure that the employees asked to incorporate AI technology into their work can participate in its use.
We’ve seen that cognitive technologies can potentially free up millions of labor hours for government workers, with the magnitude of those savings dependent on policy decisions. But what will government workers do with those liberated hours?
Senior policymakers will have a choice—one that mirrors our perennial national debate about big versus small government. Some may see AI-based technology as a lever to shrink government workforces, aiming to deliver the same services with fewer employees. Other jurisdictions may choose to use the applications as tools for their workers, encouraging them to find new ways to use liberated work hours to improve the services they provide to citizens. The most forward-leaning jurisdictions will see cognitive technologies as an opportunity to reimagine the nature of government work itself, to make the most of complementary human and machine skills.
AI will support all these approaches. It will be up to government leaders to decide which will best serve their constituents.
Appendix: Data and methods
Data used in this research originates from two main sources: information on numbers of workers, their demographic characteristics, and their salaries collected by the federal Office of Personnel Management (OPM) and our large Midwestern state’s Department of Administrative Services; and data on tasks performed by 1,110 occupations collected by the US Department of Labor as part of its O*NET OnLine database. The first source provides information on who is in the workforce; the second tells us what they do.
Analyzing the data requires linking both sources via a crosswalk, and OPM helpfully publishes one at www.eeoc.gov/federal/directives/00-09opmcode.cfm. The Midwestern state does not provide such a crosswalk, so we created one using state employee salary data and the state’s online job classification handbook.
Establishing the current baseline
O*NET contains the results of worker surveys asking respondents to estimate the time spent on each of their work activities for 19,125 detailed, occupation-specific tasks. We convert those frequency scale ratings to annual task-hours, assuming 2,080 total person-hours per full-time equivalent, using these equivalences:
Less than yearly
More than Daily
We use 1,043 as the equivalent for “hourly” on the assumption that even tasks performed around the clock take up no more than half of a worker’s time, with the other half used for non-occupation-specific activities. Multiplying by the proportion of respondents, choosing each value, and summing over the task, we calculate the average annual hours for the activity. This provides annual task-hours.
We then tally the annual task-hours performed by each occupation, multiply by the workforce-specific employment in that occupation, and apply a scale factor (0.45 for the federal workforce and 0.25 for the state workforce) to estimate total task-hours performed by all members of the workforce. This provides the labor inputs to a task.
The 19,125 O*NET tasks are further linked to more than 2,000 “detailed work activities,” 331 “intermediate work activities,” and 37 “general work activities,” allowing us to analyze annual task-hours and labor inputs for work tasks at any desired level of specificity.
Understanding changes in task labor inputs
The O*NET program surveys workers in each occupation repeatedly, but at irregular intervals. For 13,356 of the 19,125 detailed ONET tasks (70 percent), ONET reports two or more observations of task frequency at different time points. In this sample, the earliest observation of a task took place in 2003; the latest was in 2016. The length of time between observations averaged 7.03 years, with a minimum of two years and a maximum of 13.
Given two observations of the labor inputs to a task at time 1 (t1) and time 2 (t2), we calculate the percent change in annual task-hours for that task. We use the formula (t1-t2)/average(t1, t2) to calculate percent change for this and other time trends in this paper.
A decrease in labor inputs to a task over time can have many explanations, including structural changes to the occupation and changes in customer demand; dry cleaners don’t do much sewing anymore. One explanation, however, is that technology has substituted for part of the labor of the task.
We calculate the correlation between percentage change in task labor input and task importance using Pearson product-moment correlation to demonstrate that, on average, peripheral tasks are automated before core tasks.
We measure the standard deviation of the changes to task labor input and use that value to constrain the Monte Carlo simulation of levels of AI investment described in the following section.
Ranking activities and occupations according to automation potential
The realization that tasks requiring social intelligence, creative intelligence, or perception and manipulation are less easy to adapt to AI technology allows us to rank O*NET’s 331 intermediate work activities (IWAs) according to their automation potential.
For each of the 19,125 O*NET detailed tasks, we code it with three binary variables according to whether the associated occupation requires social intelligence, creative intelligence, or perception and manipulation. We use the same O*NET indicators that Carl Frey and Michael Osborne used35 to assign those binary values. Each IWA is linked through O*NET’s database structure to one or more tasks. For each, we average the binary values for social intelligence and call this the IWA’s social index, which measures how many of the tasks included in the IWA are performed by occupations requiring social intelligence. We do the same to build a creative index and a perception/manipulation index. We then sum the indices for each IWA, ranking them according to the sum of the three indices. IWAs with lower combined index values are easier to automate than activities with higher index values.
We rank the 331 IWAs according to automation potential and combine that ranking with employment to rank occupations. We do so by linking each occupation to the IWAs it performs. We use the average combined automation index for all the IWAs linked to an occupation, weighted by number of task-hours spent on each IWA, to represent the automation potential of the activities of the occupation. We rank occupations according to combined IWA automation index and employment.
When applied to the 669 federal occupation series established by the Office of Personnel Management, this method yields the following 20 jobs with both the highest automation potential and highest employment (figure 12).
Monte Carlo simulation of AI technology adoption scenarios
We begin with the data set of 19,125 detailed O*NET task descriptions, representing each using intrinsic task characteristics discussed above: task importance and the binary variables for whether the occupation requires social intelligence, creative intelligence, or perception and manipulation.
For the three levels of effort in the scenarios, we choose a base mean for the normal distribution as shown in figure 10 and set the standard deviation to 0.63 based on the percentage changes to 13,356 task labor inputs described above.
We run the simulation as follows. For each task, if the task requires social intelligence, creative intelligence, or perception/manipulation, we set the distribution mean to zero. Otherwise, we set the distribution mean to the base mean times the reciprocal of task importance, on a scale of one to six. We then sample percentage change to the annual task-hours from that distribution and store the results. We report scenario results by running the simulation 10 times and averaging the results.