Government CDOs may be among the best placed to encourage health care stakeholders to practice Open Science—the idea that knowledge must flow freely across the scientific community to accelerate breakthroughs that benefit society.
The health care sector is teeming with data. Electronic health records, technologies such as smart watches and mobile apps, and major advances in scientific research—especially in the areas of imaging and genomic sequencing—have given us volumes of medical and biological data over the last decade. One might assume that such a data-rich landscape inherently accelerates scientific discoveries. However, reams of data alone cannot generate new insights, especially when they exist in silos, as is often the case today.
Open Science—the notion that scientific research, including data and research methodologies, should be open and accessible—can offer a solution. Without powerful champions, however, such openness may remain the exception rather than the rule. Practicing Open Science inherently requires cross-sector collaboration as well as buy-in from the public. This is where government chief data officers (CDOs) could play a key role.
Read the full CDO Playbook
Create a custom PDF
Learn more about the Beeck Center
Subscribe to receive public sector content from Deloitte Insights
The early stages of the Open Science movement can be traced back to the 17th century, when the idea arose that knowledge must flow freely across the scientific community to enable and accelerate scientific breakthroughs that can benefit all of society.1 Four centuries later, Open Science remains an idea that has yet to be fully realized. However, collaborative tools and digital technologies are making the endeavor more achievable than ever before. Rather than simply sharing knowledge in scientific journals, we now have the ability to share electronic health records, patient-generated data, insurance claims data—even genomic data—in standardized, interoperable formats through web-based tools and the cloud. Moreover, with advanced analytics and cognitive technologies, we can process large volumes of data to identify complex patterns that can lead to new discoveries in ways that were almost unimaginable until recently. Using these data and tools is essential to achieving Open Science’s so-called FAIR principles—that data should be findable, accessible, interoperable, and reusable2 (see the sidebar, “What is FAIR?”).
The FAIR principles are a set of guiding principles for scientific data management and stewardship to support innovation and discovery. Distinct from peer initiatives that focus on the human scholar, the FAIR principles put specific emphasis on enhancing the ability of machines to automatically find and use data—in other words, making data “machine-actionable”—in addition to supporting its reuse by individuals. Widely recognized and supported in the scientific community, the principles posit that data should be:
Consider cancer research. Dr. Jay Bradner, a doctor at a small Harvard-sponsored cancer lab, created a molecule called JQ1—a prototype for a drug to target a rare type of cancer. Rather than keeping the prototype a secret until it was turned into an active pharmaceutical substance and patented, the lab made the drug’s chemical identity available on its website for “open source drug discovery.” The concept of open source drug discovery borrows two principles from open source computing—collaboration and open access—and applied them to pharmaceutical innovation. Scientists from around the world were able to learn about the drug’s chemical identity so that they could experiment with it on various cancer cells. These scientists, in turn, have created new molecules to treat cancer that are being tested in clinical trials.4 Collaborations like these allow hundreds of minds to study the individual pieces of a complex problem, multiplying the usual pace of discovery.
Federal and state governments—and their CDOs—have two unique levers that they can apply to encourage greater openness and collaboration: They hold enormous quantities of health data, and they have the ability to influence policy and practice.
US government health data derives from public programs like Medicare and Medicaid, which collectively cover one in three people in the United States;5 government-sponsored disease registries; the Million Veteran Program (MVP), one of the world’s largest medical databases, which has collected blood samples and health information from a million veteran volunteers; and the National Institutes of Health’s (NIH’s) recent All of Us initiative, a historic effort to gather data from 1 million or more US residents to accelerate research and improve health.6 In addition, federal agencies such as the Department of Health and Human Services (HHS), as well as a handful of states, cities, and counties around the country, have begun hiring CDOs to help determine how data is collected, organized, accessed, and analyzed. According to Project Open Data, an online public repository created by President Barack Obama’s Open Data Policy and Executive Order,7 the CDO’s role is “part data strategist and adviser, part steward for improving data quality, part evangelist for data sharing, part technologist, and part developer of new data products.”8
CDOs looking to advance Open Science should consider ways to meaningfully share more government health data and to encourage nongovernment stakeholders, including academic researchers, health providers, and ordinary citizens, to participate in Open Science data platforms and share their own data. To do so, they will need to address the various technological, policy, and cultural challenges.
Open Science requires a technological infrastructure that allows data to be securely shared, stored, and analyzed. In an effort to develop this infrastructure, the NIH has begun piloting a “Data Commons,” a virtual space where scientists can store, access, and share biomedical data and tools. Here, researchers can utilize “digital objects of biomedical research” to solve difficult problems together and apply cognitive computing capabilities in a single cloud-based environment.9 This platform embraces the FAIR principles, including the need to safeguard the data it contains with secure authentication and authorization procedures. The pilot is due to be completed in 2020,10 after which lessons learned are expected to be incorporated into a number of permanent, interoperable, sustainably operated Data Commons spaces.
A Data Commons, however, is only as good as the quality and quantity of the health data it contains. Government health agency CDOs can play an important role in increasing participation in Data Commons by moving their agency’s data from on-premise storage units to large-scale cloud platforms that are interoperable with the NIH’s Data Commons, making it more accessible. Equally important is to improve the quality of the shared data, which means putting it in formats that are findable, interoperable, and reusable—that is to say, making it machine-actionable.
The legal and regulatory landscape surrounding what data can be shared, with whom, and for what purpose can be a source of confusion and caution among health care providers and institutions that collect or generate health data. The real and/or perceived ethical, civil, privacy, or criminal risks associated with data-sharing have led many researchers and health care stakeholders to avoid doing so entirely unless they feel it is essential. This “better safe than sorry” approach can impede high-impact, timely, and resource-efficient discovery science. Furthermore, in academia, a researcher’s career advancement can depend on his or her ability to attract grant funding, which in turn depends on his or her ability to generate peer-reviewed publications. In this competitive environment, researchers have little incentive to collaborate with and share their valuable data with their peers. On top of these barriers, the effort and cost associated with making data FAIR are significant.
Government CDOs have an opportunity to overcome such barriers to data-sharing through a combination of education, support structures, and appropriate policies and governance principles. CDOs could conduct educational outreach to academics, health care providers, and other stakeholders to clarify data privacy laws such as the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health (HITECH) Act. The goal would be to help these stakeholders understand that, rather than prohibiting data-sharing, these laws merely define parameters around when and how to share data. Through written materials, videos, and live workshops, CDOs can clarify regulatory requirements to encourage data-sharing among health care stakeholders and individuals who are being asked to share their personal health information.
In addition to educating stakeholders, CDOs can prompt agencies to take advantage of certain policies that allow government agencies to require data-sharing. The 21st-Century Cures Act, for instance, gives the director of the NIH the authority to require that data from NIH-supported research be openly shared to accelerate the pace of biomedical research and discovery.11 Such policies must be complemented with appropriate benefits for researchers who share their data—for instance, giving such researchers appropriate consideration for additional grants and/or naming them as co-authors on publications that use their data.
Open Science requires cross-sector participation and engagement from government entities, health care stakeholders, researchers, and the public. As part of their efforts to evangelize data-sharing, CDOs should consider engaging the broader community by stoking genuine interest and appreciation of the crucial role data-sharing plays in science and innovation and the benefits every player can gain from it.
One way of engaging health care stakeholders and scientists is by giving them access to appropriate government data and tools so that they can begin using shared data and seeing its value for themselves. Another way is to seek innovative solutions to health and scientific challenges using community engagement models such as code-a-thons, contests, and crowdsourcing.12 CDOs can also encourage the general public to ensure that their data contributes to Open Science by educating them on how they can—directly or through patient advocacy organizations—encourage researchers and clinicians to share the data they collect. Lastly, with private individuals increasingly generating large volumes of valuable health data through wearables and mobile devices, CDOs can help such individuals understand how they could best share this data with researchers.
The proliferation of digital health data, coupled with advanced computational capacity and interoperable platforms such as Data Commons, gives society the basic tools to practice Open Science in health care research. However, making Open Science a reality will require all health care stakeholders, including ordinary citizens, to participate.13
Government CDOs can accelerate the spread of Open Science in several ways. They can establish policies and governance principles that encourage data-sharing. They can conduct education, outreach, and community engagement efforts to help stakeholders understand why and how to share data and to encourage them to do so. And they can serve as role models by making their own agencies’ data available for appropriate public use.
Like all important movements, Open Science will likely face ongoing challenges. Those at the helm will need to balance the opportunities it provides with the inherent risks, including those related to data privacy and security. Of all the stakeholders in scientific discovery, government CDOs may be among the best placed to help society sort through these opportunities and risks. As public servants, they have every incentive to embrace a leadership role in promoting Open Science for the common good.