How CDOs can overcome obstacles to open data-sharing
Governmental CDOs can take the lead in making data publicly accessible, allowing citizens to help find solutions for thorny social problems. What challenges could stand in the way?
Open data has been a hot topic in government for the past decade. Various politicians from across the spectrum have extolled the benefits of increasing access to and use of government data, citing everything from enhanced transparency to greater operating efficiency.1
Read the full CDO Playbook
Create a custom PDF
Learn more about the Beeck Center
Subscribe to receive public sector content
While the open data movement seems to have achieved some successes, including the DATA Act2 and data.gov,3 we have yet to achieve the full potential of open data. The McKinsey Global Institute, for example, estimates that opening up more data could result in more than $3 trillion in economic benefits.4
It is time for the open data community to pivot based on the lessons learned over the past decade, and governmental chief data officers (CDOs) can lead the way.
Much valuable government data remains inaccessible to the public. In some cases, this is because the data includes personally identifiable information. But in other situations, data remains unshared because government has procured a proprietary system that prevents sharing. Moreover, when government does share data, it sometimes does so in spreadsheets or in other formats that can limit its usefulness, rather than in a format such as an application programming interface (API) that would allow for easier use. In fact, some of the potentially most valuable public information, such as financial regulatory filings, is typically not machine-readable.
CDOs looking to achieve greater benefits through open data should devise a plan that addresses both the technical and administrative challenges of data-sharing, including:
- Mismatched incentives between political leaders and their staff: Not all data can or should be shared publicly. Agencies are prohibited from sharing personally identifiable data, medical data, and certain other information. There are, however, many gray areas regarding what can or cannot be disclosed. In these instances, the decision on whether and how to standardize or publish a government data set has all the ingredients of a standard principal-agent problem5 in economics. The principals (here, the public, legislators, and, to some extent, executive branch leaders) generally want data to be open because they stand to reap the societal and/or reputational benefits of whatever comes from releasing it. However, the decision of whether to standardize or release data is made by an agent (here, usually some combination of program managers, information technology professionals, and lawyers). The agent tends to gain little direct benefit from releasing the data—but they could face substantial costs in doing so. Not only would they need to do the hard work of standardization, but they would incur the risk of reputational damage, stress, or termination if the data they release turns out to be inaccurate, creates embarrassment for the program, or compromises privacy, national security, or business interests. As a result, even if a political leader wants to share data, there may still be obstacles to doing so.
- An “all or nothing” approach to data-sharing: The discussion of open data is often presented in binary terms: Either data is open, meaning that it is publicly available in a standardized format for download on a website, or it is not accessible to outsiders at all. This type of thinking takes intermediate options off the table that could provide much of the benefit of full disclosure, but at less cost and/or lower risk. The experience of federal statistical agencies suggests that intermediate approaches could allow even some sensitive data to be shared on a limited basis.6 For example, the Center for Medicare and Medicaid Services allows companies to apply for limited, secure access to transaction data to help them develop products that aim to improve health outcomes or reduce health spending.7
- Lack of technical expertise: Releasing a data set is generally time-consuming technical work that may require cleaning the data and deciding on privacy protections. Some governments may have limited in-house technological expertise, however, and these technical experts are often needed for other competing priorities. The skills needed to appropriately release data sets that contain sensitive information are even more technical, requiring people with an understanding of advanced cryptographic and technical approaches such as synthetic data8 and secure multiparty computation.9 Usually, the subject-matter experts who control whether a given data set will be opened do not have this expertise. This is understandable, as such skills were not historically necessary or even useful, but the skill set gap can prevent governments from sharing data even when all stakeholders agree that it should be shared.
- Difficulty in prioritizing data sets: Just as releasing data typically requires a rare combination of subject matter and technical expertise, so can figuring out which data sets to prioritize. How government data might be put to beneficial use requires imagination from people with varied perspectives. Government officials cannot always predict what data sets, especially when used in concert with other data sets, might prove transformative. This is even more true when considering the details of how data should be shared.
CDOs looking to unleash the potential of open data should consider ways that they could address these obstacles. One potential approach is to centralize decision-making authority and technical capabilities rather than having these distributed among the numerous offices and departments that “own” the data. The General Services Administration, for example, created a chief data officer position to act in this capacity. Several other agencies have done the same, and Congress is currently considering legislation to require every agency to do so.10
The open data community, for its part, can play an important part in encouraging data-sharing by helping agencies understand what data would be most useful under what conditions. CDOs sometimes do not have the political strength or the management or technical bandwidth to release all of their agencies’ data, even if this were always desirable, so prioritization is key. Regulated entities and beneficiaries should also help the government determine what the next-best alternative is if full openness is not possible. A few agencies, such as the US Department of Health and Human Services with its Demand-Driven Open Data effort, have invited the public to engage in prioritization. To promote greater openness, however, such efforts should be spread across more agencies and involve more levels at those agencies. Understanding the perspectives of those outside government can help officials balance the trade-off between releasing data and controlling the risks and costs.
CDOs’ leadership will be important in encouraging government to move swiftly to release all appropriate data that could benefit our society, democracy, and economy. To be most effective, they may need private-sector input and policy guidance that can help them and support them on the open data journey.