Reset default to “share” has been saved
Cover image by: Alex Nabaum
United States
United States
United States
United States
United States
The breakthroughs and benefits of data-sharing are well documented. Sharing data can save time, money, and lives.1 The COVID-19 pandemic is a prime example of this.
With lives and the economy on the line, US organizations that normally compete and do not share data released massive amounts of data to better understand and contain the developing pandemic. The global crisis also revealed how much more progress could still be made on data-sharing.
From understanding how the virus manifests in different populations and which treatments are most effective, to identifying its emerging long-term effects, to developing, administering, and monitoring a safe, efficacious vaccine—these activities largely depend on open access to data and collaborative science. It amounts to conducting research and governing with the lights on. We can all see better.
Yet the challenges to data-sharing are much more than technical. Deloitte hosted a Virtual Convening on Data-sharing in Biomedical Research conference with more than 30 leaders across the US research, health care, government, and data science sectors.2 The consensus: The more established the research space, the harder it is to share data and collaborate. A major reason for this: The assumptions and mindsets that have taken root over the years create organizational structures and incentives that can act as barriers to successful data-sharing.
Making significant changes in this mindset often requires a large shock. The COVID-19 pandemic provided such a shock that helped change the status quo and accelerate data-sharing. But how can government leaders sustain the urgency of a global emergency and continue to promote the kind of data-sharing and open science that accelerate discovery and responsiveness?
We identify seven ways government leaders can create a culture that prioritizes sharing health data for the greater good:
1. Reset default to “share.” Government leaders should start with the expectation that data has value for the public and should be shared. This is largely a paradigm shift away from the current possessive default position that most academic, government, and commercial health care and life sciences organizations vigorously maintain. It may be time for a perspective reset from whether to share to how to share.
Some propose mandating a standard framework for infectious disease reporting and decision support. Digital Bridge3 is an example of a framework that draws upon the now ubiquitous footprint of electronic health records (EHR) in the United States as the standard source of rich, real-time data used for public health surveillance of infectious diseases. One founding principle of the Digital Bridge is that, wherever possible, existing standards and technologies would be used to make reporting infectious diseases seamless and improve the timeliness, accuracy, and value of the data.4 Architects of Digital Bridge say the reduced burden of reporting was what initially sold providers and practitioners on sharing their data using the platform.5 The research value of more timely data is gaining recognition.
For example, the National Center for Advancing Translational Sciences (NCATS) initiative National COVID Cohort Collaborative (N3C)6 is a centralized data analytics platform which, in a matter of months, brought together clinical, laboratory, and diagnostic data from medical research sites across the country. The platform enables rapid collection and analysis of COVID-19–related EHR data going back two years and is preparing to continue collection for several more years, which will allow evaluation of chronic COVID-19 sequelae. There are currently more than 2 billion rows of data on the platform, which creates vastly more statistical power than any single medical research site could achieve.7 The potential for discovery and breakthroughs is enormous. Researchers and health care providers can finally answer clinically important questions such as, “Can we predict who might need dialysis because of kidney failure?” or “Who might need to be on a ventilator because of lung failure?” or “Are there different patient responses to coronavirus infection that require distinct therapies?”8
NCATS director Chris Austin said N3C’s collaborative feat would have been impossible in pre–COVID-19 times, but now the N3C platform has achieved the power to tip the risk/benefit calculus toward data-sharing in the public interest. However, he fears a return to the pre–COVID-19 “dark ages” of data silos if all parties in the research ecosystem don’t continuously maintain a mindset that prioritizes sharing and collaboration.9
Creating a culture where data-sharing is the norm will require courageous leadership that doesn’t take “no” for an answer. Leadership in the new era of data-sharing is expected to actively pursue alliances, prioritize sharing and collaboration, and persist through challenges.
Creating a culture where data-sharing is the norm will require courageous leadership that doesn’t take “no” for an answer.
2. Empower citizen ownership. Digital user experience platforms are driving improvements to products and services across virtually every sector—from tourism and technology to food and fitness. Citizens have the same opportunity to share feedback and data, leading to more responsive and personalized services.
In the health care and biomedical research field, patients are the true owners of the health data used to provide care and create new drugs and treatments. This data has an intrinsic value for patients, and they will protect their privacy until they see a commensurate benefit in return.10 With such a clear benefit from improved health care, an overwhelming majority of patients are supporting more effective data-sharing.11 As such, patients are in a prime position to agitate and advocate for increased sharing of data and to raise awareness about the consequences of not doing so.
An increasing number of researchers and health care advocates are going directly to patients as a data source and giving them the tools to understand and dictate how data is used. For instance, the National Institute of Health (NIH)’s “All of Us” research program seeks data from more than a million people living in the United States to accelerate research that is more responsive and reflective of a diverse population. The effort includes a survey of how people are impacted by COVID-19. The findings could be important in understanding disparities and the longer-term effects of the virus.12
Increasing public awareness about the value of data and how it can be used to improve health and find cures and treatments can be important to accelerate data-sharing.
3. Reward sharing and resharing. To encourage data-sharing, sharing and resharing should be rewarded. Incentives, in the form of funding and career advancements, should focus on celebrating data-sharing and collaborative science and recognize these aspects as a differentiator of superior performance.
Researchers are typically hesitant to share data they’ve spent decades curating, standardizing, and making useful, particularly before they’ve extracted all of its value. Continued funding of research, and their career success, depends on their ability to be the first to publish new and important discoveries. By sharing data, then, they run the risk that others will use their data to make discoveries and publish first.
In addition, the majority of funding and incentive structures encourage the creation of new and duplicative databases when what is actually required is curating, harmonizing, and making data interoperable and reusable.
Making data interoperable allows researchers studying different diseases to join forces. For example, researchers studying multifocal inflammatory syndrome (MIS-C) among children diagnosed with COVID-19 were granted access to the world’s largest database of Kawasaki syndrome cases, which has very similar symptoms and immune responses. By comparing and contrasting patients’ symptoms and outcomes, scientists discovered that common therapies used for Kawasaki disease were proving effective in pediatric MIS-C COVID-19 cases.13 This collaboration likely advanced science more rapidly than any institution could accomplish alone.
To incentivize data-sharing and open science, the Chan Zuckerberg Initiative provides grants specifically for researchers who will work together across disciplines to conduct research and make their findings broadly available. The approach pairs complementary investigative approaches and makes it easy to share and reshare data by establishing open repositories for software code, experimental protocols, and results that are uploaded to preprint servers to communicate them more quickly.14
Bringing new combinations of disciplines and data together is creating exciting new breakthroughs. The National Institutes of Health (NIH) is a leader in facilitating and funding open science. For example, an NIH-funded research project combining expertise in computational biology and machine learning (ML) with expertise in gene regulation and heart development is working to help identify a breakdown in the DNA of developing fetal hearts. By analyzing thousands of patients’ genetic mutations, researchers plan to use a 4D model to identify new mutations that are predicted to cause abnormal folding. The findings hope to provide children with congenital heart disease with an accurate genetic diagnosis that can lead to more effective treatment options. The model will also be made publicly available for other researchers to use with their data.15
4. Make data science a full partner. The research enterprise needs more experts in data analytics, computer science, and ML to exponentially speed advances that can solve some of the world’s most vexing problems. Bench scientists and epidemiologists can no longer do it alone. The power of advanced computing capabilities in querying a vastly larger number of combinations and questions in the midst of exploding health data can dramatically reduce timelines for developing effective new drugs and therapies.16
ML and artificial intelligence (AI) are becoming more common in study analysis plans but are too often viewed as costly “extras” versus being integral to discovery. However, data scientists should be considered equal partners in collaborative science, and data science should be recognized as a distinct discipline with serious and robust research methods and scholarships.17 Government leaders have the opportunity to advance and elevate data science and provide professional development opportunities, including participating in hackathons and data-focused conferences, to scientists.
Data scientists should be considered equal partners in collaborative science, and data science should be recognized as a distinct discipline.
The biomedical research industry has a pressing need to hire data science talent in this new era of data-sharing and open science but will be competing with other industries (e.g., private technology sector). They should plan incentives to strengthen and recruit data and computer scientists while helping current biomedical researchers to update their skills and think about the integral role data science plays in discovery. Such trainings can help data scientists be more inclined to share and collaborate, because they will understand the increased power and validity that comes with amplified quantity and increased combinations of data.
Those participating in Deloitte’s Virtual Convening on Data-sharing say this influence can help pave career paths that make collaboration a norm. They add that because COVID-19 was a fresh, new challenge, there were none of the territorial battles common in disease research that can trip up collaboration. The lack of self-interest and focus on public interest created unprecedented collaboration and discovery.18 Government leaders should take every opportunity to nurture this new momentum and culture shift.
5. Demand accountability for not sharing. There are several examples of incentives and other carrots that encourage opening access to data and collaboration in research. Is there a role for sticks and consequences for not sharing?
This point is worth considering as decisions to not share data can largely go unnoticed. Recent Wall Street Journal coverage called out health care’s challenges around real-time data. “Why hospitals can’t handle COVID-19 surges: They’re flying blind,” a former US Department of Health and Human Services emergency health planning official was quoted as saying. He added, “It’s staggering to most people how little visibility there is outside of a particular health system… the attempts to build a federal system to share information have failed… Every time these things happen everybody throws their hands up and says, ‘I can’t believe these things don’t work more closely together.’”19
Raising awareness about the blind spots in health care data, and potentially life-and-death consequences of not sharing, can be important in changing the mindset and behavior toward sharing.
Raising awareness about the life-and-death consequences of not sharing data will be important in changing the mindset toward sharing.
Consider the massive reorganization and culture change at the Joint Special Operations Command (JSOC), engineered by General Stanley McChrystal. McChrystal transformed the unit responsible for defeating decentralized terrorists into a “share-first” organization by holding people accountable for not sharing. McChrystal realized better information needed to move across the command more quickly, with both soldiers and analysts given more degrees of freedom to act. “We tried to make it the culture where, if you don’t share information, you can be held accountable for that. If somebody didn’t know something they needed to know, and you had that information, then they shouldn’t have to ask you the question: If you know they need it, you need to make sure they get the information.”20
Journalists, patient advocates, and government leaders can challenge common excuses for not sharing data in the public interest. Some common justifications include criticisms that “others’ data” is not good enough, or performatively engaging in a relentless pursuit of clarity, perfection, and negotiation. As long as these excuses are accepted and go unchallenged, hoarding of data can persist.
Additional consequences could come from agencies and funders that establish and enforce performance standards for data-sharing and collaboration to create a shared responsibility for results. If these research standards become the norm, the architects of personnel management could help enshrine them in performance expectations, encouraging change agents to further an insurgency toward shared data and collaborative science.
6. Prioritize ethical dimensions critical to trust. Public and practitioner trust in data-sharing and collaborative science is imperative for continued success in the area. This trust requires continuous work to ensure privacy, patient consent, ethical use, and transparency. Lessons can be learned from groundbreaking work in genetics, the military, and AI—all of which engage the broadest possible range of expertise and perspectives to address the ethical ramifications of new technology and tools. Such an approach can provide a 360-degree view of data-sharing that anticipates the cascading consequences of decisions made.
Whether unintentional or the result of bad actors, there is ample opportunity for things to go wrong while sharing data. Ethical codes and foundations should be defined to ensure shared data is used for equitable, ethical public good. This is a nuanced task. An obvious ethical dimension is understanding that data is sensitive and privacy must be protected. Equally important is ensuring that the algorithms and AI tools used to analyze data eliminate biases that can disadvantage certain parts of the population.
As the digital world rapidly expands and takes new forms, the imperative to build and maintain trust in sharing data can’t be overlooked or minimized.
The US Department of Defense has created a set of guiding principles to help safeguard ethics and build a trustworthy AI strategy.21 More broadly, a framework for “Trustworthy AI” can help promote not only the ethical use of AI, but also reliability and user confidence in AI. The framework uses six dimensions to help ensure trustworthiness, requiring AI methods to be fair and impartial, transparent and explainable, responsible and accountable, robust and reliable, safe and secure, and respectful of privacy.22
As the digital world rapidly expands and takes new forms, the imperative to build and maintain trust in sharing data can’t be overlooked or minimized. A trusted, neutral intermediary organization with strong leadership can garner the trust and confidence of researchers, practitioners, providers, data scientists, and patients and work through the inevitable and ongoing ethical dimensions of data-sharing and open science.
7. Capitalize on the current momentum. In response to the 9/11 terrorist attacks, the White House and Congress created the Department of Homeland Security (DHS), the Office of the Director of National Intelligence, and other organizations to rapidly expand the sharing and coordination of national security information across government. The goal: to see and respond better.23
This is an instructive example of ambitious action taken to bring about systemic changes related to an urgent national priority. Government leaders might ask whether a similar entity or bold, bipartisan action is now needed to coordinate surveillance of multiple sources of health data and intelligence and create an infrastructure to act on that data.
The Department of Health and Human Services (HHS) has the Office of the National Coordinator (ONC) for Health IT, which was created by an executive order in 2004 and mandated by Congress in 2009, to support a nationwide, interoperable health information exchange. Similarly, the NIH’s final Policy for Data Management and Sharing explicitly requires researchers to share scientific data generated through NIH funds. The policy punctuates the NIH’s leadership in open science and will hold grantees accountable for sharing starting January 2023.24 The efforts of both the ONC Health IT and the NIH merit the full support of the health and biomedical research enterprise.
Yet simply having a policy that requires data-sharing is not enough. Governments need the organizational and technological infrastructure to make sharing a reality. In the aftermath of 9/11, the Office of the Director of National Intelligence (ODNI) was created to coordinate activities across the intelligence community, instating an organizational focal point for data-sharing. The ODNI also helped develop a common cloud platform for the intelligence community—the Intelligence Community Information Technology Enterprise (IC ITE). The IC ITE provides a common architecture to enable better sharing of data from 17 different entities. Having easy access to this data source allows AI and ML applications to identify threats in a range of data sources from bank accounts to satellite imagery to police reports.25
Health care and biomedical research have similar reasons and motivations to create a robust mechanism for sharing data, though the task is more complicated. Making progress against public health threats such as COVID-19 and diseases, such as cancer and Alzheimer’s, could require data-sharing among health care’s federated public-private ecosystem. A national health data-sharing authority could facilitate sharing among many groups—including biomedical researchers, patients, pharma companies, hospitals, and more. Such an entity could provide a level of accountability and trust that shared data would be used and credited appropriately and fairly. It could also ensure that the value of contributing data assets is equitably accounted to retain integrity of sharers’ business models (e.g., via data marketplaces). A federal coordinating body would be in a position to secure significant long-term financing, the best talent in data science and cybersecurity, and a responsive approach that delivers value to researchers and practitioners while using the latest tools to protect privacy.
After 9/11, the United States committed as a nation to change its expectations and develop new routines and ways of thinking in the interest of national security. This might be precisely the culture shift and infrastructure that health care data-sharing requires at this moment.
Simply having a policy that requires data-sharing is not enough. Governments need the organizational and technological infrastructure to make sharing a reality.
Having witnessed the starring role data has played related to COVID-19, it’s clear that data-sharing and collaborative science are possible and worth promoting. There is a growing body of knowledge designed to help government leaders maximize the value of data. Our colleagues’ Seven lessons COVID-19 has taught us about data strategy26 provides guidance that covers the collection, analysis, and presentation of data, as well as data governance and privacy issues. This knowledge, combined with advances in technology, makes efficient data-sharing and analysis more achievable than ever.
COVID-19 has shown that accelerated and responsive data collaboration is possible, so why not extend that same success to other diseases or other hard problems facing government?
These seven strategies can keep the motivating force of a national crisis front and center to create a culture that prizes the discovery, innovation, and public good that comes from data-sharing and open science.
Cover image by: Alex Nabaum