The chief data officer in government A CDO playbook

12 June 2019

In today?s data economy, governments should rethink what data can accomplish and work toward facilitating platforms where it can be used efficiently to further the greater public good, say Deloitte?s Bill Eggers and Beeck Center?s Sonal Shah.

 
“When you look at their responsibilities, they really have a huge level of responsibilities in terms of safeguarding government data.”

—Bill Eggers

SONAL SHAH: It is transparency …

BILL EGGERS: Managing privacy and protecting citizens’ information

SONAL SHAH: It's waste, fraud, and abuse.

BILL EGGERS:  How to use machine learning across the enterprise, how to develop an AI strategy …

SONAL SHAH: It's also about how do we make data more useful for people to use and to solve problems in their communities?

TANYA OTT: Okay, that is a big job. Who is this superhuman who fills it? 

Learn more

Explore the CDO Playbook

Download the full CDO Playbook or create a custom PDF

Listen to more podcasts

Subscribe on iTunes

Subscribe to receive related content from Deloitte Insights

TANYA OTT: We’ll tell you, in a moment. But first, let me say, you’re listening to the Press Room, where we talk about some of the biggest issues facing businesses today. I’m Tanya Ott and joining me today are Bill Eggers …

BILL EGGERS: I'm the executive director of Deloitte Center for Government Insights.

TANYA OTT:  And …

SONAL SHAH: I'm Sonal Shah. I am the executive director and a professor of practice at Georgetown University's Beeck Center.

TANYA OTT: Bill and Sonal are coauthors of The CDO Playbook – a guide for Chief Data Officers. For the last decade, government has been focused on making data more open and easily [accessible] to the public. I asked Sonal to start by explaining what kind of data we’re talking about, and why this push for transparency?

SONAL SHAH: I am happy to do this. I was in government at the beginning of the Obama administration when we pushed the first executive order through for open data and more transparent data. I think there are three things that are driving this push on data. One [is] government has been making data more publicly available for decades, but it's things that we don't necessarily think about. Things like weather data that is publicly made available from government, GPS data which is made available. There's economic data that's made available, whether it's through with the census or Bureau of Labor Statistics data. And this push was more to say, there are so many other datasets in government, and if we can make that publicly available—health care data, education data—[then] nonprofits, the private sector, and citizens can use that data to solve for problems. And this push for open data is not just about transparency, but it's also about how do we make data more useful for people to use and to solve problems in their communities.

TANYA OTT: So, we're talking about operating more efficiently, being able to see where there is waste or fraud or abuse, but also just building trust with citizens.

SONAL SHAH: Exactly.

BILL EGGERS: I wanted to talk even more about the value and the overall benefits of this data because it goes beyond just economic benefits to literally saving lives, in many respects. If you look at the health care sector alone, it collects an immense amount of data. You've got genomic data, electronic health records, clinical trial data, not to mention all of the patient-generated data from all the technologies like smart watches and mobile apps. So, the health care world is truly a data-rich landscape. But right now, the data exists in silos due to privacy concerns, competitive concerns, and barriers to interoperability and so forth. You could literally save millions of lives if we can figure out how to bring all that data together, and government plays a big role in this because government has a lot of the data. Increasingly, government can be the platform for bringing together a lot of the public and private data.

The [National] Institute of Health has launched the NIH Data Commons, and it's basically a cloud-based platform where investigators can store their data, share it, access it, experiment with all sorts of different objects that are generated from biomedical research, and this can speed up hypothesis generation, discovery, evaluation of all sorts of stuff. It's now in its pilot phase and over the next few years, they hope to have a lot of high-value datasets, which are blended data that can serve as test cases for the kind of principles, processes, and architecture for this all to happen. At the same time, you've got the National Cancer Institute, [which has] a Genomic Data Commons to enable data-sharing. You have huge cancer studies now, which are based on the principle of open science and bringing that data together, not having them in silos. In a data economy like we're living in, the ability of government to serve as a platform for a lot of these major data efforts involving millions or billions of datasets is incredibly important and powerful.

TANYA OTT: You make a powerful case for that, Bill. You mentioned EHR, electronic health records, and I know that, at least initially and maybe still today, there were some issues with electronic health records because one hospital might use one piece of EHR software, another one might use another, and so things didn't really sort of speak the same language. Do we have this issue—as you have things like the NIH Data Commons or the National Cancer Institute Genomic Data Commons—is there this issue of different players that are using language that doesn't speak with each other?

BILL EGGERS: Actually, you bring up several key points. Number one is a big piece of doing this right—any sort of big technology adoption—is really thinking about it from the standpoint of human-centered design add really understanding how people interact with technology, when it works for them, [and] when it doesn't work. And that's why a lot of these products that we love are so powerful because they've got the design right in terms of people adopting them. And one of the problems with electronic health records was there's a lot of issues with doctors and nurses [who] really felt it made their job not easier, but a lot harder than it was even before. We didn't take into context enough of those cognitive developments around design. We talked on a previous podcast about the notion that nudges and behavioral science applied to this. That's one of the pieces. And secondly, of course, the issue of interoperability, which for so many of these efforts is one of the biggest challenges.

Another key element of these data platforms is smart cities. Smart cities could become the platform for all this Internet-of-Things data that's so critical to make it a reality. Street lights that only turn on with motion sensors, that can save cities money, and reduce light pollution and so forth. But a lot of the data is going to be coming from some public data and some different infrastructure operators and so forth. You have to figure out a way of making that data interoperable for the cities to become very smart. One of the major emphasis of things that chief data officers (aka CDOs) are doing now is working on that piece of interoperability of data, both within the public sector, but also with public and private blended data.

SONAL SHAH:  It's interoperability, but it's also standardization. We know that everybody is talking about the same thing when they say this healthcare problem versus another healthcare problem. So, there's some standardization that also needs to take place.

And then the second [thing] is there are rules that we need to work through, and lots of chief data officers within government are trying to figure that out, which is we have these HIPAA (Health Insurance Portability and Accountability Act) requirements that do not allow patient records to be shared between governments or between private and public. They're working through, how do you protect a patient's privacy while at the same time sharing data so we know when 40 people have a common disease across borders and we know what's happening. Take the opioid crisis as a great example. If you are in a county [where] a hospital picks you up or the ambulance picks you up, then the county is touching you, the city is touching you, the state is touching you, the Federal Government is touching you. So sharing data across those four levels of government is super important so we can know when is a program working, when is the program not working, how many people are coming in with opioid overdoses, what's happening in which community ... that information is super important to share, but we have to work through what were good reasons to have privacy and to figure out how we make sure we continue to maintain patient privacy but at the same time be able to do better public policy.

TANYA OTT: Yeah, that's an incredible balancing act, right?

SONAL SHAH: Right. And it's figuring that out, which is hard to do, but I think a lot of chief data officers at various levels of government are trying to make that work and understand that protecting a patient is also super important. So, let me give you an example of one where it has been interesting. It was around 2010 or 2011 the Department of Veterans Affairs created a thing called the Blue Button which allowed veterans to download their data record and take it with them if they were going to a non-Veterans Affairs hospital. And that data record, which is a standardized format, could then be used by the hospital they took it to. Even though it seems very basic, that you can take a very simple file and give it to somebody, it was not available. There are sometimes very simple things you can do that don't require hugely big changes, but sometimes even small things can go a long way and, in that case, it also gave the patient the right to their data and who they gave it to.

TANYA OTT: Underpinning some of this is machine learning, which can help discover patterns and anomalies in the data and then make predictions out of that. But a recent study of several thousand executives at companies in 17 different countries found that only 10 percent of their companies were investing in machine learning.1 I'm curious why such a low adoption of such a powerful tool and how that low adoption may be hamstringing people in terms of being able to leverage the real power of big data across government and the private and public sectors.

BILL EGGERS: I do think that that is changing fairly dramatically. A few years back, many governments struggled to even understand the value of AI strategies and there wasn't a lot of talk. In that same survey, we found that more than 80 percent of US public sector organizations said that they're planning to use AI and nearly 90 percent consider cognitive technologies to be of extreme importance for their internal business practices.

The other interesting thing is that, another study that looked at AI spending through 2022, and at federal and central governments around the world, said we're going to log about 44.3 percent compound annual growth rate, which is even faster than AI spending in personal and consumer services.2 So we're seeing almost like a hockey stick sort of thing—not a lot of awareness just a few years ago, and this is really beginning to accelerate fairly dramatically but it's going to take a while to get those budgets changed because the US federal government spent US$90 billion a year on IT,3 but a lot of that is legacy spending and maintaining existing operations and systems. And so actually switching over and channeling more and more to these sorts of interesting technologies that are game changers is going to take a little bit of time.

We identified over 25 countries around the world which have national AI strategies right now. There are some barriers, of course. They've cited privacy issues and data quality, data integration, training AI, and just having staff to do so. But there is a focus on this at the same time, and I do think we'll continue to see that big growth over the next decade or two.

SONAL SHAH: Doing this is going to require an investment of money, because we can get the data, and efficiency, and waste, fraud and abuse, but we don't always talk about [how] it costs a little bit more to put these systems in to benefit from them. And I know that's something that governments don't always talk about, but it's super important to make that investment.

TANYA OTT: Very good point. Thank you for adding that. A lot of times when laypeople hear the term big data, they kind of zone out because it seems a little overwhelming and they don't really have a sense of how it relates to them. But I heard a phrase recently that really resonated with me. It was, “People hear statistics, but they feel stories.” And that's why this idea of turning data into storytelling is so important. It's like, I guess you might say, the last mile to get us from the data preparation and analysis point to action, by telling stories with the numbers. You have a really great example of this that comes out of New Orleans. Tell us a little bit about that one.

BILL EGGERS: Back in 2010, five years after Hurricane Katrina hit the city, blight was still an intractable problem in New Orleans. They had over 43,000 dilapidated properties and overgrown lots. It was one of the worst rates of blight in all of America. And so, then Mayor Mitch Landrieu set this goal of cutting blight in the city by 10,000 units by 2014. Ten thousand units—now that's a pretty big goal. Well, the city achieved the goal of full year early and is now down to under 25,000 blighted properties, and one crucial tool in this effort was what they called BlightStat. It was an analytics program that used data from the Code Enforcement Department and other agencies to identify solutions, set priorities, and evaluate the performance in the whole city's campaign to get these troubled properties under control. When they started using it, they saw the rate of property inspections multiply five-fold in just 10 weeks, thanks to knowledge they extracted from the data and being able to actually go to the properties that have the most problems. The then head of the office, Oliver Wise, said, “We saw eye-popping returns from simply shining light on a service area where there previously had not been light.” So the data led them to basically be able to see a lot of things that they simply couldn't see before and to be able to target solutions in a much better way and a much more rapid way.

One of the biggest uses of data analytics right now is towards what I call anticipatory government and targeting problems before they erupt into crisis. We're seeing this where governments are using this, from spotting fraud to combating opioid epidemic. And really this notion that an ounce of prevention is worth a lot of cure. We're seeing it being used in a wide variety of areas including defense and health care, human services policymaking. The police department of Durham, North Carolina, uses AI and big data in crime fighting—which many police departments are using it—and it enables the police to observe patterns and interrelations in criminal activities and identify pockets with high incidence of crime, thus allowing for quicker interventions. And again, oftentimes trying to go towards anticipatory and preventative government as opposed to reactive. It contributed to a 39 percent drop in violent crime in the city just from 2007 to 2014.

SONAL SHAH:  On the international side, there are some really good examples of how the World Food Programme uses data to position where do they need to put food and station it before a crisis happens so they're prepared when there is a crisis that they can get food in at the right time. Same with disaster assistance, so [that] making sure that if there's a hurricane coming, know where you need to stage yourself so [that] after the hurricane you can get in quickly to provide the services in an immediate way.

BILL EGGERS: Building on that, the government of Indonesia teamed up with a local startup to predict and manage floods. They used historical flood data that was collected via sensors, and they also tapped into citizen-complaint data in order to predict areas most prone to flooding, [so they could] position ahead of time and even do more from a preventive perspective.

TANYA OTT: All of this data also raises some ethical issues. One of them is algorithmic bias, which is basically that computer systems function in a way that reflects the implicit values of the humans who designed them and programmed them. So how does algorithmic bias work and what can we do to address it?

SONAL SHAH: Whether it's algorithms or statistics or other things, it's making sure we ask the questions of what it is the algorithm is predicting for and understanding: Who can it affect? How can it affect them? Do we have a bias on particular types of police data, like Bill mentioned, is a super interesting example. Are we sure we're not just picking certain communities, but we're actually picking characteristics of people, not the types of people, in order to understand where crimes might happen and what types of things relate to crimes. But those are super important questions to ask up front. You don't build the algorithm, then ask the question. You need to ask that question upfront before building the algorithm.

BILL EGGERS: There have been studies that have looked at things like a famous one from ProPublica which looked at algorithmic-based risk criminal risk assessments in a county in Florida, and they said that controlling for defendant criminal history, gender, and age, the researchers concluded that essentially black defendants were 77 percent more likely than others to be labeled at higher risk of committing a violent crime in the future.4 Now, the company that developed the tool denied the presence of bias, but still few of the criminal risk assessment tools being used across the United States—and they're being used in many police departments right now—have undergone extensive independent study and review.

And I will say also that there are a lot of efforts to deal with this issue and address the issue of algorithmic bias or black-box issues in the algorithms. There are efforts to make the technology better. Right now, there are nonprofits working on this. There are academics working on it. But it is a complicated area.

One other quick story I'll tell is that Allegheny County (Pennsylvania) developed an approach using algorithms, a tool to assess risks in children who are in suspected abuse or endangerment case. The tool will conduct a statistical analysis of hundreds of variables in order to assign a score of 1 to 20 to each incoming call reporting suspected child mistreatment. And then the call screeners, who are in the social services office, would consult the algorithm’s risk assessment to help determine which cases to investigate. Essentially, that you can't investigate them all, which ones are [of] most high priority? The study suggested that the tool enabled a double-digit reduction in the percentage of low-risk cases proposed for review, as well as a smaller increase in the percentage of high-risk calls marked for investigation. So, what it did was it reduced the number of false positives. But, like a lot of these risk-assessment tools, this one received some criticism for potential inaccuracies or bias that was stemming from its underlying data and proxies. One thing that they did do in Allegheny County, which I think is best practice, was to have transparency around it. The tool was developed by academics in the field of social welfare and data analytics and they actually implemented, following an independent ethics review, transparency in terms of how they developed the algorithm. That is something we need to see more and more of, that level of transparency where people can interrogate that and question it and so forth and make sure that, from a public-policy standpoint, they're okay with the underlying algorithm that's been developed.

SONAL SHAH: So, if you're doing criminal-justice algorithms, make sure that those communities that work with criminal justice and understand bias are in the room as you're building it or as part of the team as you're building it. It's easy to have technical experts, but you also need those that understand the communities effectively so they know what the biases have been. A lot of times, we don't know what those biases are upfront. If they're not in the room, then you maybe inadvertently putting in a bias without knowing who else needs to be in that room.

BILL EGGERS: There are increasingly calls for and the need to train AI developers to test for and remediate systems that unintentionally encode bias and treat users or other affected parties unfairly. And, oftentimes, almost all the time, this is unintentional; but they're not trained to look for this and how to detect it and how to make sure [in] things like hiring decisions we aren't encoding biases of existing hiring systems into these new systems. Computer-science departments in universities and grad schools having this ethics component and adding training is really going to be important as we try to overcome a lot of these challenges.

TANYA OTT: Here's my super softball question, but I'm just really curious. It seems like the C-suite is getting a little crowded these days. We heard you guys referred to CDOs, which would be chief data officers. We've got chief information officers. We've got chief technology officers. Are these all a version of the same thing, or do they each have their own role in this big data world of government?

SONAL SHAH: It's not a softball question and it's hard question.

TANYA OTT: Okay, good!

SONAL SHAH: I would say the following. Chief information officers and chief technology officers and chief data officers were created for the need at the time. I would say now is the time to not step back, but to step forward and say, “Okay, what does the structure need to look like that all of these groups are talking to each other [so] there is a consistency.” There are agencies where the chief data officer or the chief information officer or the chief technology officer all work together, but it could easily create more silos. But it's now a moment to say, how do they all need to work together. The underlying legacy technology, as Bill was talking about earlier, the new technologies and the types of technologies that need to be procured, as well as, what is that data used for? Who's going to use it? How is it going to be used? Where does it need to go? Who in the system needs to have access to it? All of those questions are super important questions that need to be thought through. But it all sort of started as filling needs. It just now needs an organizing mechanism that's better.

BILL EGGERS: It's interesting that just five to seven years ago there was really no such thing as chief data officers in most organizations, and certainly not in government. Now when you look at their responsibilities, they really have a huge level of responsibilities in terms of safeguarding government data, opening up government data to citizens and businesses, managing privacy and protecting citizens' information, looking at all these issues of how to use machine learning across the enterprise, how to develop an AI strategy, and then how to address all the challenges around algorithmic risk and so forth. For a position that didn't exist just a short time ago, there's certainly a lot for them to do. But I agree that we need the chief data officers to coordinate with the CIOs, the CPOs (chief privacy officer), and the chief information security officers across agencies to build the team and structure and budget that can support and appropriately manage all of these data assets. But I do think [having] this position of a CDO in a leadership role helps organize all the data pieces and around using public data for the public good.

TANYA OTT: Okay, thank you so much both of you for being here today and we'll drive everybody to the website to read the full report.

TANYA OTT: That was Bill Eggers, executive director of Deloitte Center for Government Insights, and Sonal Shah, executive director and professor of practice at Georgetown University’s Beeck Center. Their Chief Data Officer Playbook is available at deloitteinsights.com, where you can also find an archive of our podcasts. I think we’re up to almost a hundred episodes—so lots to explore there.

You can keep up with us on Twitter at @deloitteinsight. I’m on Twitter at @tanyaott1. Be sure to subscribe to the podcast and leave us a review … We want to know what you think! I’m Tanya Ott. Thanks for listening … catch ya again in two weeks.

This podcast is provided by Deloitte and is intended to provide general information only. This podcast is not intended to constitute advice or services of any kind. For additional information about Deloitte, go to Deloitte.com/about.

Explore the CDO Playbook