Generative AI and government work: An in-depth analysis of 19,000 tasks

Deloitte's analysis reveals three criteria that can help determine which tasks could be assigned to generative AI tools and when government occupations could feel pressure to adopt them.

Tasha Austin

United States

Joe Mariani

United States

William D. Eggers

United States

Edward Van Buren

United States

We are awash in examples of what generative artificial intelligence can produce: near-human-quality text, images, and even video. Yet, there’s less evidence of how generative AI will impact how work is done. A lack of detail is especially challenging for government, where a broad range of agencies perform a wide variety of tasks, and those differences really matter when it comes to AI. For government leaders, this can lead to uncertainty that stalls adoption of generative AI and even other automation tools that could otherwise have benefitted the public.

To help cut through that uncertainty and get the broadest-possible perspective on how and where generative AI can impact government work, we examined more than 19,000 tasks collected by the US Department of Labor to represent the wider US workforce.1 By analyzing how much accuracy, creative difficulty, and context variability (how much a task changes in different scenarios) are needed to accomplish each task, we were able to assess which tasks could be amenable to which types of automation.

We identified three categorizations that can help government leaders make informed, strategic decisions about how to implement generative AI in their organizations.

  1. Tasks with moderately high creative difficulty, moderate context variability, and moderate accuracy could be good candidates for gen AI. Take for example, a task like recording regulatory compliance.
  2. Tasks with high accuracy and low context variability (like data entry) are likely good for other forms of automation, ranging from robotic process automation to physical robots to other forms of machine learning.
  3. Finally, humans still outperform AI at dealing with tasks that have high context variability, especially tasks that have a high social aspect (like coaching workers) or a physical component (like maintaining vehicles).

 

Figure 1

Your tools depend on your tasks

Scroll to explore the tasks.

Arrow down
gen ai task tools

Dark blue tasks can be a good fit for gen AI

Examples include preparing speeches, summarizing laws, or making reports. Prior to the release of gen AI, nearly all of these creatively intensive tasks could have only been completed by humans.

dark blue tasks

Teal tasks are amenable to previous generations of AI and automation

These tasks typically harness automation’s abilities to handle large volumes of data with precision to accomplish tasks such as predicting maintenance failures or calculating costs.

teal tasks

Purple tasks still need human judgment

These tasks involve high context variability, especially when that context involves social interaction or physical movement. For example, training other workers and making strategic decisions for an organization are tasks still best left to human judgment.

purple tasks

Figure 1

Your tools depend on your tasks

Click through to explore the tasks.

gen ai task tools

Dark blue tasks can be a good fit for gen AI

Examples include preparing speeches, summarizing laws, or making reports. Prior to the release of gen AI, nearly all of these creatively intensive tasks could have only been completed by humans.

dark blue tasks

Teal tasks are amenable to previous generations of AI and automation

These tasks typically harness automation’s abilities to handle large volumes of data with precision to accomplish tasks such as predicting maintenance failures or calculating costs.

teal tasks

Purple tasks still need human judgment

These tasks involve high context variability, especially when that context involves social interaction or physical movement. For example, training other workers and making strategic decisions for an organization are tasks still best left to human judgment.

Purple tasks

 

But beware of easy answers. Just because a task is shown as a particular color does not mean it’s always the best fit for that automation tool. The color coding simply suggests that if you’re undertaking a dark blue or teal task, it might be worthwhile to explore how an automation tool could help.

Different automation tools have different strengths and weaknesses—and gen AI is no exception

Generative AI is a powerful tool that can do many things, but just because it can doesn’t mean it should.

While generative AI can create new content in ways that other automation tools can’t, it may occasionally do so at the cost of accuracy—for example, the now-infamous hallucinations.2

You can see gen AI’s strengths and weaknesses visually in the graphic below. As you move from right to left on the creative difficulty axis, you move into creative tasks like preparing whole reports that previous generations of AI could not handle, but gen AI can.

But moving from bottom to top on the accuracy axis also shows generative AI’s weaknesses: tasks that require significant levels of accuracy such as making eligibility determinations for benefits like unemployment insurance or a small business loan. Gen AI will usually give you an answer for such tasks, but it may not always be correct—something that is not acceptable for tasks that demand accuracy.

Because different occupations do different tasks, the impact of gen AI will vary widely

With different industries and occupations performing different tasks in their work, it may be natural to see a variation in how much gen AI is likely to impact how that work is done. With government performing such a wide range of work, understanding the variation in gen AI impact is crucial for adoption. At a high level, more knowledge-based occupations such as education or management are seeing greater immediate impact from gen AI than more physical-based occupations such as logistics or maintenance.

Even within an industry, variation can help shine a light on exactly how gen AI is being used. Within education, for example, teaching professions have a high percentage of tasks that are amenable to gen AI. As a result, they are already grappling with both with student use of gen AI as well as how to use gen AI in instruction and research themselves.3

In contrast to teaching roles, noninstructional roles in education (such as administrative roles in finance and human resources) have many automatable tasks, but many of those tasks may be more suited to other forms of automation and may already have been automated. The result is less immediate pressure to adopt gen AI, at least for the time being.

With fewer tasks amenable to gen AI, noninstructional roles in education may not feel as strong a need to use standalone gen AI tools. However, over the next few years, gen AI models are likely to become less expensive and less computationally intensive, allowing them to be more easily embedded into a variety of tools that people already use to do their jobs (from accounting software to HR tools to contract templates).4 Embedding gen AI into these tools can make them easier to use (by auto-generating reports, for example) or improve productivity (for example, by allowing users to query huge volumes of data using plain language).

The result is that there are likely to be two waves of gen AI adoption: one immediate wave for those with many tasks already amenable to the tool, like teachers and professors, and a later wave a few years later for those who will make use of future versions of gen AI-enabled tools.

Even occupations with lots of physical work are not immune

The dual-wave adoption of gen AI also has implications for government workers with more physical work. Government workers in maintenance, manufacturing, construction, logistics, and similar occupations may not see much immediate impact from gen AI but are likely to experience the second wave of adoption.​

Consider workers in government shipyards, highway maintenance divisions, or sanitation departments. The bulk of their day-to-day work is physical in nature, but they still often need to receive work orders, track tools, or record maintenance fixes. Embedding gen AI in maintenance management, inventory tracking, and other systems that these workers use every day can improve both the ease and efficiency of their work. ​

​So, while not every government worker may be using gen AI immediately, most will likely find gen AI touching their work eventually.

Work is about more than just accomplishing individual tasks

It’s important to remember that most work activities involve more than one task. Work activities that create value for the organization are likely to feature several tasks, usually several different types of tasks amenable to different automation tools.

Figure 6

Value comes from workflows of several, often very different, tasks

Take the work of a government lawyer, for example. To make an argument in court, government lawyers may need to do several tasks:

Arrow down
generative ai values

Help set policy,

Help set policy

Gather evidence about previous cases,

Gather evidence

Analyze those cases for relevant evidence, and ...

Analyze cases

Make a judgment and argue that judgment in court.

Each of those tasks requires different skills, making them amenable to different types of automation.

Getting the work done would require not one monolithic AI tool but several smaller ones—working together with and supervised by human judgment.

Make a judgment

Figure 6

Take the work of a government lawyer, for example. To make an argument in court, government lawyers may need to do several tasks:

To make an argument in court, government lawyers may need to do several tasks. Scroll to explore the task workflows.

Explore the tasks Arrow down

Help set policy,

Help set policy

Gather evidence about previous cases,

Gather evidence

Analyze those cases for relevant evidence, and ...

Analyze cases

Make a judgment and argue that judgment in court.

Make a judgment

Each of those tasks requires different skills, making them amenable to different types of automation.

Getting the work done would require not one monolithic AI tool but several smaller ones—working together with and supervised by human judgment.

 

The future of gen AI, then, is embedded and ubiquitous. Small, narrowly-scoped gen AI tools are likely to be embedded within a wide range of the tools we already use today, working alongside other forms of automation to help make our work faster and more productive.

What does this mean for your AI strategy?

So, what can government leaders do to help make sure this future of AI benefits the public? There are three ways that this task-level analysis can inform how government delivers value.

Efficiency: If you automate individual tasks, it can help improve the efficiency of government. For example, teachers could use generative AI to generate bibliographies from their lesson plans, saving hours that could be spent helping students instead of typing.

Effectiveness: Work is made up of more than just single tasks, and adapting workflows to use a set of different automation tools, each taking on the tasks to which they are best suited, can increase how effectively government accomplishes its mission, not just how quickly. Our earlier example of government lawyers is a great example, where robotic process automation, generative AI, and human judgment can all come together to make writing legal briefs not just faster, but also better.

Efficacy: Efficiency and effectiveness are limited to only improving tasks already done today. As organizations become more familiar with generative AI, they can also find entirely new ways of working that can deliver better mission outcomes. For example, the New York City Fire Department has used AI to create a new pathway to save firefighters: an AI-enabled tool to prioritize building inspections of structures most likely to have unauthorized modifications, which may pose a danger during fires.5

How could generative AI help you? The answer is most likely to be a blend of all three benefits. The art to strategy is using tools like task-level analysis to find the right opportunities to improve efficiency, effectiveness, and efficacy in your mission.

Methodology

Our analysis of the automatability of the work tasks had three principal steps:

1. Score all O*Net tasks

Beginning with all 19,000 tasks in the Department of Labor’s O*Net database, we created an index of the accuracy, creative difficulty, and context variability needed to execute a task. Those three indices were chosen based on existing literature for the strengths and weaknesses of different automation tools. For each index, we selected knowledge, skills, and abilities that represented those traits in use. This allowed us to score each of the 19,000 tasks based on the average knowledge, skills, and abilities in occupations where those skills were important.

2. Normalize scores

Normalizing the scores to a one to 10 scale allowed us to analyze tasks based on their relative need for each index. For example, every task will vary with context, but normalizing allowed us to easily find those that varied the most from one instance to another.

3. Assign automation suitability

Finally, we used existing literature on the strengths and weaknesses of different automation tools to create windows in which each tool (generative AI, other forms of automation such as robotic process automation, robotics, and more, as well as human judgment). For example, tasks that demanded moderate (25th to 75th percentile) creative difficulty and accuracy along with low context variability (less than 25th percentile) are one set of tasks amenable to generative AI. 

By

Tasha Austin

United States

Joe Mariani

United States

William D. Eggers

United States

Edward Van Buren

United States

Endnotes

  1. National Center for O*NET Development, O*Net Online, “Homepage,” accessed March 19, 2024.

    View in Article
  2. Karen Weise and Cade Metz, “When AI chatbots hallucinate,The New York Times, May 9, 2023.

    View in Article
  3. Khari Johnson, “Teachers are going all in on generative AI,” Wired, September 15, 2023.

    View in Article
  4. Recent Deloitte analysis of publicly available announcements suggests that 100% of the 50 largest enterprise software companies are planning to offer a version of their software that has generative AI features. Duncan Stewart, Baris Sarer, Gillian Crossan, and Jeff Loucks, “Generative AI and enterprise software: What’s the revenue uplift potential?” Deloitte Insights, accessed March 19, 2024.

    View in Article
  5. Christina Dorfhuber, John O'Leary, and Sushumna Agarwal, “Surviving the pandemic budget shortfalls,” Deloitte, September 9, 2020.

    View in Article

Acknowledgments

The authors would like to thank Sandeep Vellanki and Akshay Prabhu Jadhav from Deloitte Insight’s Data Science and Survey Advisory Group for their assistance in developing the visualization and deploying the analysis; and Shiv Kulshrestha who provided valuable support with Gen AI Studio. Finally, we would like to thank Annalyn Kurtz, Kavita Majumdar, Molly Piersol, Joanie Pearson, Melissa O’Brien, and the entire Deloitte Insights team who helped make this interactive a reality.

Cover image by: Molly Piersol