We are awash in examples of what generative artificial intelligence can produce: near-human-quality text, images, and even video. Yet, there’s less evidence of how generative AI will impact how work is done. A lack of detail is especially challenging for government, where a broad range of agencies perform a wide variety of tasks, and those differences really matter when it comes to AI. For government leaders, this can lead to uncertainty that stalls adoption of generative AI and even other automation tools that could otherwise have benefitted the public.
To help cut through that uncertainty and get the broadest-possible perspective on how and where generative AI can impact government work, we examined more than 19,000 tasks collected by the US Department of Labor to represent the wider US workforce.1 By analyzing how much accuracy, creative difficulty, and context variability (how much a task changes in different scenarios) are needed to accomplish each task, we were able to assess which tasks could be amenable to which types of automation.
We identified three categorizations that can help government leaders make informed, strategic decisions about how to implement generative AI in their organizations.
But beware of easy answers. Just because a task is shown as a particular color does not mean it’s always the best fit for that automation tool. The color coding simply suggests that if you’re undertaking a dark blue or teal task, it might be worthwhile to explore how an automation tool could help.
Generative AI is a powerful tool that can do many things, but just because it can doesn’t mean it should.
While generative AI can create new content in ways that other automation tools can’t, it may occasionally do so at the cost of accuracy—for example, the now-infamous hallucinations.2
You can see gen AI’s strengths and weaknesses visually in the graphic below. As you move from right to left on the creative difficulty axis, you move into creative tasks like preparing whole reports that previous generations of AI could not handle, but gen AI can.
But moving from bottom to top on the accuracy axis also shows generative AI’s weaknesses: tasks that require significant levels of accuracy such as making eligibility determinations for benefits like unemployment insurance or a small business loan. Gen AI will usually give you an answer for such tasks, but it may not always be correct—something that is not acceptable for tasks that demand accuracy.
With different industries and occupations performing different tasks in their work, it may be natural to see a variation in how much gen AI is likely to impact how that work is done. With government performing such a wide range of work, understanding the variation in gen AI impact is crucial for adoption. At a high level, more knowledge-based occupations such as education or management are seeing greater immediate impact from gen AI than more physical-based occupations such as logistics or maintenance.
Even within an industry, variation can help shine a light on exactly how gen AI is being used. Within education, for example, teaching professions have a high percentage of tasks that are amenable to gen AI. As a result, they are already grappling with both with student use of gen AI as well as how to use gen AI in instruction and research themselves.3
In contrast to teaching roles, noninstructional roles in education (such as administrative roles in finance and human resources) have many automatable tasks, but many of those tasks may be more suited to other forms of automation and may already have been automated. The result is less immediate pressure to adopt gen AI, at least for the time being.
With fewer tasks amenable to gen AI, noninstructional roles in education may not feel as strong a need to use standalone gen AI tools. However, over the next few years, gen AI models are likely to become less expensive and less computationally intensive, allowing them to be more easily embedded into a variety of tools that people already use to do their jobs (from accounting software to HR tools to contract templates).4 Embedding gen AI into these tools can make them easier to use (by auto-generating reports, for example) or improve productivity (for example, by allowing users to query huge volumes of data using plain language).
The result is that there are likely to be two waves of gen AI adoption: one immediate wave for those with many tasks already amenable to the tool, like teachers and professors, and a later wave a few years later for those who will make use of future versions of gen AI-enabled tools.
The dual-wave adoption of gen AI also has implications for government workers with more physical work. Government workers in maintenance, manufacturing, construction, logistics, and similar occupations may not see much immediate impact from gen AI but are likely to experience the second wave of adoption.
Consider workers in government shipyards, highway maintenance divisions, or sanitation departments. The bulk of their day-to-day work is physical in nature, but they still often need to receive work orders, track tools, or record maintenance fixes. Embedding gen AI in maintenance management, inventory tracking, and other systems that these workers use every day can improve both the ease and efficiency of their work.
So, while not every government worker may be using gen AI immediately, most will likely find gen AI touching their work eventually.
It’s important to remember that most work activities involve more than one task. Work activities that create value for the organization are likely to feature several tasks, usually several different types of tasks amenable to different automation tools.
To make an argument in court, government lawyers may need to do several tasks. Scroll to explore the task workflows.
Explore the tasks
The future of gen AI, then, is embedded and ubiquitous. Small, narrowly-scoped gen AI tools are likely to be embedded within a wide range of the tools we already use today, working alongside other forms of automation to help make our work faster and more productive.
So, what can government leaders do to help make sure this future of AI benefits the public? There are three ways that this task-level analysis can inform how government delivers value.
Efficiency: If you automate individual tasks, it can help improve the efficiency of government. For example, teachers could use generative AI to generate bibliographies from their lesson plans, saving hours that could be spent helping students instead of typing.
Effectiveness: Work is made up of more than just single tasks, and adapting workflows to use a set of different automation tools, each taking on the tasks to which they are best suited, can increase how effectively government accomplishes its mission, not just how quickly. Our earlier example of government lawyers is a great example, where robotic process automation, generative AI, and human judgment can all come together to make writing legal briefs not just faster, but also better.
Efficacy: Efficiency and effectiveness are limited to only improving tasks already done today. As organizations become more familiar with generative AI, they can also find entirely new ways of working that can deliver better mission outcomes. For example, the New York City Fire Department has used AI to create a new pathway to save firefighters: an AI-enabled tool to prioritize building inspections of structures most likely to have unauthorized modifications, which may pose a danger during fires.5
How could generative AI help you? The answer is most likely to be a blend of all three benefits. The art to strategy is using tools like task-level analysis to find the right opportunities to improve efficiency, effectiveness, and efficacy in your mission.
Our analysis of the automatability of the work tasks had three principal steps:
1. Score all O*Net tasks
Beginning with all 19,000 tasks in the Department of Labor’s O*Net database, we created an index of the accuracy, creative difficulty, and context variability needed to execute a task. Those three indices were chosen based on existing literature for the strengths and weaknesses of different automation tools. For each index, we selected knowledge, skills, and abilities that represented those traits in use. This allowed us to score each of the 19,000 tasks based on the average knowledge, skills, and abilities in occupations where those skills were important.
2. Normalize scores
Normalizing the scores to a one to 10 scale allowed us to analyze tasks based on their relative need for each index. For example, every task will vary with context, but normalizing allowed us to easily find those that varied the most from one instance to another.
3. Assign automation suitability
Finally, we used existing literature on the strengths and weaknesses of different automation tools to create windows in which each tool (generative AI, other forms of automation such as robotic process automation, robotics, and more, as well as human judgment). For example, tasks that demanded moderate (25th to 75th percentile) creative difficulty and accuracy along with low context variability (less than 25th percentile) are one set of tasks amenable to generative AI.