Enterprise data sovereignty: If you love your data, set it free Tech Trends 2018
Making data “free”—available and actionable to all business units, departments, and geographies—sounds great, right? But doing so requires implementing modern approaches to data architecture and governance, and navigating global regulations around privacy and protection.
We have entered a new age of digital enlightenment—one driven by ever-growing volumes of data and the valuable customer, strategic, and operational insights that information contains. In this new age, not only is there more data than ever before—it is being generated by a wider variety of sources, making it more revealing. As Deloitte’s 2017 Tech Trends report explored, insight-rich data from transactional systems, industrial machinery, social media, IoT sensors—and from nontraditional sources such as images, audio, video, and the deep web—increasingly informs decision-making and helps chart new paths to the future.1
To those already on the path to digital enlightenment, it is becoming increasingly clear that to realize its full potential, data should be free—free not in a monetary sense but, rather, in terms of accessibility and ubiquity. At a time when traditional boundaries separating organizational domains are coming down, it becomes more important than ever to expose data widely so that analysts can use it to create value.
Yet even when data is free, we have to make sense of it. Traditionally, “making sense of data” meant imposing upon it top-down, canonical definitions and hierarchies of access rights and creating layer upon layer of governance protocols. This Dewey Decimal System-esque approach has been, in essence, just a formalized way to try to control chaos using brute force.
We expect that, in the next 18 to 24 months, more companies will begin modernizing their approaches to data management, working to strike the right balance between control and accessibility. As part of the growing trend toward enterprise data sovereignty, these companies will develop deliberate techniques for managing, monetizing, and unlocking the value of an increasingly vital enterprise asset.
Their efforts will focus on solving data challenges in three domains: management and architecture, global regulatory compliance, and data ownership. The challenges that many organizations encounter in each of these areas are varied and persistent. For example:
- How can we expose data across organizational boundaries and functional domains while still managing it deliberately and effectively?
- How can we automate laborious and often manual data classification and stewardship tasks?
- How can we, as a global company, comply with regulatory and privacy requirements that differ dramatically by nation?
- Who in the enterprise is ultimately responsible for all this data? Does the CIO own it? The COO? Anybody at all?
The enterprise data sovereignty trend offers a roadmap that can help companies answer these and other questions as they evolve into insight-driven organizations. Without a doubt, this transition will require long-term investments in data integration, cataloging, security, lineage, augmented stewardship, and other areas. But through these investments, companies can create a dynamic data management construct that is constantly evolving, learning, and growing.
Data, then and now
IT departments developed traditional data management techniques when data volumes were still relatively small. In this simpler time, structured business data typically lived in tables or basic systems.
Even then, strategists, CIOs, and other decision-makers struggled to get their arms—and heads—around it. Many companies took one of two basic approaches for dealing with data:
Laissez-faire. Decision-makers accepted that data management was messy and difficult, so rather than face its challenges deliberately, they built one-off systems to address specific needs. Data warehouses, operational data stores, reports, and ad-hoc visualization ruled the day, requiring behind-the-scenes heroics to rationalize master data, cleanse dirty data, and reconcile discrepancies.
Brute force. Recognizing data’s greater potential, some companies tried—with mixed success—to get their arms around the data they possessed by creating a citadel in which data was treated as scripture. All processes were strict and regimented, which worked when all data was structured and uniform but became difficult to sustain when different types of data entered the system. To maintain data consistency and quality, companies relied heavily on mandates, complex technologies, and manual procedures.
Fast-forward two decades. Both of these approaches have proven inadequate in the age of big data, real-time reporting, and automation, especially as data continues to grow in both volume and strategic importance. Moreover, this phenomenon is encompassing all industries and geographies. Consider the automobile, which has in recent years become less a machine than a sensor-laden, data-spewing computer on wheels. Recently, Toyota, Ericsson, and several other companies announced that they will jointly develop new data management architectures to accommodate an expected explosion of automotive-generated data. “It is estimated that the data volume between vehicles and the cloud will reach 10 exabytes per month around 2025, approximately 10,000 times larger than the present volume,” the consortium reported.2
To be clear: 10XB is 10 billion gigabytes—from cars alone, every month.
IDC offers a macro view, predicting that by 2025, the world will create and replicate 163 zettabytes of data annually (a ZB is 1 trillion gigabytes), representing a 10-fold increase over the annual amount of data generated just nine years earlier.3
With this data tsunami approaching—or already here, depending on whom you ask—forward-thinking companies can launch their enterprise data sovereignty journeys by answering the following foundational questions about advanced data management and architecture, global regulatory compliance, and ownership:
What will advanced data management and architecture look like in my company? When we speak of data management in the context of enterprise data sovereignty, we are talking about much more than how and where data is stored. We are also describing:
- Sourcing and provisioning of authoritative data (for example, batch, real-time, structured, unstructured, and IoT-generated), plus reconciliation and synchronization of these sources
- Metadata management and lineage
- Master data management and unique identifiers
- Information access and delivery (for example, analytics and upstream/downstream consuming applications)
- Security, privacy, and encryption
- Archiving and retention
Using traditional data management tools and techniques, these complex tasks often require manual intervention. Moving to the cloud or adopting a federated system can add additional layers of complexity.
As companies explore ways to deploy new tools and redesign their data management architectures, they should think less about organizing data into specific structures, instead focusing on deploying tools within new architectures to automate the decision-making processes in sourcing, storing, and governance. Though architectures vary by need and capability, most advanced data management architectures include the following components:
- Ingestion and signal processing hub: Sourcing and ingestion solutions for structured and unstructured public, social, private, and device data sources. Can include natural language processing and text analytics capabilities.
- Dynamic data fabric: Solutions that dynamically build a data dictionary across the enterprise while maintaining metadata and linkages. Using data discovery solutions, ontologies, and visualization tools, a dynamic data fabric explores and uncovers multidimensional relationships among data. It also depicts these relationships using interactive technologies and spatial, temporal, and social network displays.
- Data integrity and compliance engine: Capabilities to enhance data quality and fill data gaps to ensure quality and integrity while maintaining regulatory compliance.
- Cognitive data steward: Cognitive technologies that help users understand new compliance requirements, and augments human data stewardship by defining data quality and compliance rules. Cognitive data stewards—deployed in tandem with machine intelligence, bots, and other technologies—can automate many traditionally manual governance, oversight, and accountability tasks.
- Enterprise intelligence layer: Machine learning and advanced analytics solutions that illuminate deeper data insights, which can lead to more confident decision-making and real-time action. Among other tasks, the enterprise intelligence layer dynamically builds master data, catalogs, lineage, and security profiles, identifying changes in usage, consumption, and compliance.
Who should “own” data in my organization? Currently, many organizations employ a data steward who focuses primarily on data quality and uniformity. While this individual may not “own” data in the enterprise, she is the closest thing the company has to a data authority figure. With data increasingly a vital business asset, some organizations are moving beyond simple data management and hiring chief data officers (CDOs) to focus on illuminating and curating the insights the data can yield. Increasingly, CDOs develop data game plans for optimizing collection and aggregation on a global scale; this includes leveraging both structured and unstructured data from external sources. Finally, a CDO’s data game plan addresses geographic and legal considerations about storage.
How do global companies meet regulatory requirements that vary widely by nation? Data hosted on cloud services and other Internet-based platforms is subject to the jurisdiction of the countries where the data is hosted or stored. As straightforward as this may sound, global regulation of data remains a persistently thorny issue for business. Several key questions must be addressed: Who has ownership rights to data? Who is permitted to access data stored in another country? Can a host country lay claim to access the data of a third country that might not be on the same continent as the host nation? There are surprisingly few easy answers.
On May 25, 2018, the European Union will, depending on whom you talk to, either bring welcome clarity to such issues or add yet another layer of regulatory complexity to data management regimes worldwide. On this day, a body of data privacy and usage laws known as the General Data Protection Regulation (GDPR) goes into effect,4 aiming to prevent companies from collecting, processing, or using consumer data without first obtaining consent from the individual to whom the data pertains. And it doesn’t matter whether the data is stored on servers located outside of the EU—if data pertains to an EU citizen, GDPR rules apply. Failure to abide by GDPR rules can lead to staggering fines: up to 4 percent of company revenues or a maximum of $22 million.5
Meanwhile, Australia, China, and many other countries also enforce their respective regulations, and aggressively pursue noncompliant organizations. A recent report by Ovum, an independent analyst and consultancy firm in London, has observed that while the cost of regulatory compliance might be substantial, noncompliance will likely be even more expensive.6
Currently, global companies have several technology-based options to aid in meeting the letter of jurisdictional laws. For example, a sophisticated rules engine deployed directly into cloud servers can dynamically apply myriad rules to data to determine which stakeholders in specific jurisdictions are allowed access to what data. Or companies can segregate data into logical cloud instances by legal jurisdiction and limit cloud access to those data stores to users in each locale.
Finally, as any good CDO understands, draconian regulation of a particular jurisdiction may freeze data—with any luck, only temporarily. However, insights gleaned from those data assets are not subject to jurisdictional regulations and can be transferred freely throughout global organizations. With this in mind, shifting the focus from data to insights can help global organizations capitalize on data while remaining in compliance with local law.
As a discipline, data management is not new—nor are half-baked claims to have “reinvented” it. Because we understand that some may greet news of an emerging data trend with a degree of hard-earned skepticism, we will try in the following paragraphs to address concerns, correct common misunderstandings, and set the record straight on enterprise data sovereignty and its possibilities.
Misconception: We’ve already tried using master data solutions to turn lead into gold. What you are describing sounds like another fool’s errand.
Reality: It’s different this time . . . seriously. Here’s why: Many of the master data solutions available during the last 20 years were federated systems with a master data set and separate “working” sets for storing various data types—for example, customer, product, or financial data. The process of reconciling the master and working sets was manual and never-ending. Moreover, all data management rules had to be written prior to deployment, which had the net effect of straitjacketing the entire system from day one. The enterprise data sovereignty trend offers something different. Federated models and manual processes give way to automation and an advanced data management toolkit that includes natural language processing and dynamic data discovery and ontologies, plus advanced machine learning and cognitive capabilities. The system requires less up-front rule-making and can teach itself to manage complexity and maintain regulatory compliance consistently across internal and external ecosystems.
Misconception: Even with automation, you still have frontline people inputting dirty data.
Reality: True, workers inputting and manipulating system data have historically introduced more complexity (and dirty data) than the systems ever did. Moreover, rewarding and penalizing these workers did little to address the issue. In an advanced management system, automation, machine learning, and relational capabilities can help improve data quality by organizing data uniformly and consistently, providing a context for it, and making specific data sets accessible broadly—but only to those who need it. Moreover, when designing their data architectures, companies should consider moving data quality, metadata management, and lineage capabilities away from system centers and relocate them to the edges, where they can correct a human error before it enters enterprise data flows.
Misconception: “Freeing” data will only lead to problems.
Reality: Suggesting that data should be freely accessible does not mean all data should be accessible to everyone across the enterprise at all times. Doing so would overwhelm most people. Perhaps worse, sharing R&D or other sensitive data broadly could tempt some to engage in nefarious acts. But by using metadata, dynamic ontologies and taxonomies, and other relational capabilities, the system can have sufficient context to map data content to enterprise functions and processes. Using this map, the system—not users—determines who gets access to which data sets, and why.
Lessons from the front lines
Data drives competitiveness in Asian markets
In response to increased competition across the Asian market, in 2012 one global manufacturer began looking for ways to amplify its business model and operations. How could it grow the top line, reduce costs, and develop entirely new ways to drive revenue? Leaders found an answer in ever-growing volumes of data and the valuable customer, strategic, and operational insights contained therein. By developing new approaches for managing and leveraging data, the company would be able to develop the insights it needed to achieve its strategic and operational goals.
Step one involved building a new digital foundation that, once complete, would drive repeatable, reliable data collection and usage, while remaining compliant with data regulations across borders.
The project also involved integrating new data sources, constructing a more robust customer master data system with a single view of the customer, and enhancing the protection of data both in storage and in-transit across Europe and Asia. In addition to its far-reaching technical components, the project plan called for transforming the company’s “my data” culture into one that encourages data sharing across the organization.
Since its completion, the digital foundation has enabled greater visibility into trends across functions and geographies, which has subsequently made it easier to identify improvement areas both internally and externally. For example, in 2016 the company launched a series of pilots to increase efficiencies and improve customer service. The first focused on aggregating data from a variety of internal operations and transactions across geographies—such as call centers, customer service departments, and dealer visits—and identifying early-warning indicators of potential quality issues.
Shortly thereafter, the company launched a second pilot in which it placed hundreds of sensors in the field to obtain real-time performance data. It has used these insights to optimize operations, alert customers proactively of potential quality issues, empower customer-facing employees with more in-depth product knowledge, and identify inefficiencies in the supply chain.
Though leaders intend to continue exploring new data management approaches and applying new tactics, their ultimate goal remains consistent: harness data to become more competitive not only within the existing landscape but against newcomers as well.
Making dollars and sense of data
Data is rapidly becoming the hard currency of the digital economy. To manage this currency more efficiently—and to mine it more extensively for valuable insights—leading financial services organizations are modernizing their approaches to data architecture and governance.
Today, many financial services firms have large stores of potentially valuable historical data residing in disparate legacy systems. Much of this data is organized in siloes for use by specific groups. For example, sales might “own” customer data while finance would own transactional data. In an effort to make more data accessible to everyone across the enterprise, companies are breaking down traditional information silos. One payment services provider established a Big Data platform with cognitive and machine learning to improve its data discovery and real-time research capabilities. Likewise, a global insurance firm created a “360-degree view” of the customer by connecting customer data across business units and then deploying predictive models to help drive process improvements. This approach also supported the creation of new capabilities in marketing, sales, risk management, fraud detection, underwriting, claims, and other lines of business. Meanwhile, a financial services firm implemented a metadata management repository, critical data lineage capabilities, and an enterprise data identification and tracking system that, together, make it possible to identify and track data across the global enterprise using cognitive capabilities versus traditional methods. As data moves from one system to another, accountability for that data shifts to whomever will be using it, automatically reorienting accountability to the data itself.
Some firms are also working to advance their data governance strategies. Increasingly strict regulatory oversight has made data quality management a priority at the executive and board levels. More than ever, financial services firms require complete, timely, accurate, and granular data to support regulatory reporting disclosures. To this end, they are exploring ways to automate traditionally manual governance, oversight, and accountability tasks. For example, one investment management company established a governance system in which responsibilities for the global enterprise are held by a community of data stewards who operate within a defined set of policies and procedures. These stewards handle day-to-day data management and governance issues. In parallel, the company implemented an enterprise data identification and tracking system that extends governance workflow across all systems, which helps the data stewards maintain compliance with jurisdictional data privacy and security regulations.
Bill Ruh, chief digital officer of GE and CEO of GE Digital
Data was the impetus for GE’s digital journey. We’re more than just the equipment we sell—we also help our customers run and operate their businesses more efficiently. Almost a decade ago, we started adding more sensors to our machines to better understand their performance, then realized our customers were analyzing that same data in new and different ways. We know the machines inside and out, and we are in the best position to help our customers get every bit of value out of that data and, ultimately, our machines. We knew we needed to do things differently—to evolve our business. So we launched GE Digital, with the goal of mapping the new digital industrial world by integrating our machinery, software, IT, security, fulfillment, and product management capabilities.
We viewed this move through a business lens rather than a technology one, focusing on how to help our customers improve productivity, achieve better outcomes, even create new revenue opportunities. There was no roadmap to follow, but as we started, we quickly realized it would require deep domain knowledge of our equipment to understand both the physics and the analytics of the mined data. It also meant acquiring new capabilities—such as cloud, mobile, and data science—to put in place an infrastructure and to scale it.
Many big companies lack speed but do have scale, so moving into new areas requires leveraging existing assets and then building speed. Big companies tend to operate well in the vertical, with each business unit able to operate semi-independently. But the value of digital is in the horizontal, in the ability to integrate and leverage data across the enterprise: Being digital is the only way to move forward, and that has to be driven from the top of the organization. At the same time, you want to—and need to—enable those verticals to move fast. In the beginning, we didn’t pretend that we knew what belonged in the vertical and what belonged in the horizontal; instead, we recognized the inherent conflict while committing to iterate and evolve our thinking. But we did get comfortable with the idea of reusing, interchanging, and reinforcing a culture of collaboration in order to optimize our existing assets.
We focused first on bringing new capabilities to GE’s services business, which allowed us to collect data, expand our knowledge, and determine what talent and skillsets we needed. We started in 2011 and focused internally the first two years, so we could develop a speed muscle. In 2013, we pivoted to adapt the offerings for our customers. Packaging together the data, analytics, and domain knowledge has immense value, not only in the ability to pull out cost but in the customers’ realization of the benefit to their operations.
For example, GE’s IT group built FieldVision on the Predix platform. Initially aimed at our Power services group, FieldVision became a blueprint for an automation layer for any services team. Now we provide the service to power plants to automate controlled outages, which saved one customer $200 million in one year. Most organizations utilize spreadsheet- or paper-based operations, so FieldVision is truly an outcome-focused solution for data. It allows organizations to put data in the hands of the operator to yield greater efficiencies.
There’s no inherent value in the data itself. The value is in the belief system of what the data represents, and the potential impact if it can be unlocked. Everyone has been talking about the importance of data for decades, but the complexity and cost around ERP has created a skepticism around it. Companies don’t want to get three years into their data sovereignty journey and realize the business isn’t seeing any value from it. You need to think about the transformation you will make, the outcome you will deliver, and the change you will bring. The value of data is sitting out there for everybody to take, but to optimize it, organizations need to be willing to change their operating procedures, and their people need to be willing to change how they work.
As the enterprise’s most valuable asset, data is increasingly at risk for misuse, misplacement, and mishandling. This is due in part to increased bandwidth and computing power, as well as the sheer volume of data available, growing rapidly due to advanced mining capabilities, increased storage, cloud computing, the Internet of Things, and cognitive tools. Additionally, these technologies have extended data’s reach beyond the enterprise to third parties whose practices and protocols are beyond its direct control. These circumstances call for a new approach to data security and governance.
Data governance—the process of ensuring the quality of data throughout its life cycle—isn’t intended to lock away information. In fact, data can play a key role in developing a more robust risk strategy. For example, applying analytics to nontraditional data sources can help build predictive risk models to better target potential threats (by location, population, period of time, and other factors). Similar data could assist in assessing the security protocols of new vendor and partner relationships with whom you share a network.
With such deep data troves, an organization can lose track of its data life cycle. The value of business intelligence has led to a school of thought that if some data is good, more is better, and all the data is best. Accessible, fast-growing data stores can introduce a litany of cyber risk scenarios if the enterprise fails to adopt and adhere to leading practices around its creation/collection, storage, use, sharing, and disposal. Such scenarios have given rise to consumer-centric regulations such as the European General Data Protection Regulation (GDPR) and China’s Cybersecurity Law, both of which are causing some global enterprises to rethink their data management strategies. After years of collecting as much data as possible, organizations are beginning to realize that in some instances data may be more of a liability than an asset.
For decades, many organizations spent their time, money, and resources on defenses—such as network, application, and infrastructure security—designed to keep cyber adversaries out of their networks. But because no organization can be immune to a breach, a more effective approach may be focusing on the data itself. While organizations should continue to implement and maintain traditional security measures, which act as a deterrent to cyber threats, they should also consider the following steps:
Inventory, classify, and maintain sensitive data assets. The first step to protecting data is knowing what you have and where it is. Maintaining a current inventory of data can enable an organization to proceed with data protection in a methodical manner. Additionally, when you identify your most valuable assets—the data with the highest threat vectors—you can shore up your defenses around them. Finally, an accurate inventory facilitates compliance with regulatory requirements such as the GDPR’s provisions for data portability and an individual’s “right to be forgotten”; once data has proliferated throughout an organization, locating all of it quickly for transfer or deletion could be a daunting task without an inventory. To expedite such tasks, organizations should develop and enforce rigorous governance processes that include oversight for data exchanged with third parties.
Implement data-layer preventative and detective capabilities. It is important to implement capabilities such as data classification, data loss prevention, rights management, encryption, tokenization, database activity monitoring, and data access governance. These types of capabilities enable preventative and detective capabilities at the last line of defense: the data layer itself.
Reduce the value of sensitive data. One way to reduce the value of sensitive data is to encrypt, tokenize, or obfuscate the data to render it difficult to use when compromised. A second way is to destroy it when it is no longer necessary. Decades-old data rarely generates revenue, but it can be costly to a company’s reputation when compromised.
Focusing risk strategy on the data layer itself may be one of the most effective ways to secure growing data troves and protecting its value to your organization.
The diverse, nascent-stage, and dynamic nature of global data privacy, residency, and usage regulations are a major driver of the enterprise data sovereignty trend. Across regions, there is acknowledgment of its profound impact, even while investments tend to focus on tactical responses to existing or looming government policies. From the 2018 deadlines for the European Union’s General Data Protection Regulation to recent Australian privacy laws, some believe that these country-specific responses are necessary to navigate the void created by industry regulations that often lag behind technology advances. In light of these complex laws, however, many organizations are realizing they don’t know—much less have control over—what data exists within the enterprise, where it sits, and how it is being used across business units, geographies, or with third parties.
The range of adoption timelines may reflect the global lack of technical skills and reference use cases within specific country and industry intersections. Region- and country-specific challenges play a role in these varying timelines. In Northern Europe, for example, historical context related to civil liberties, privacy, and nation-state data collection may make the topic of data sovereignty particularly sensitive and highly politicized. Across the Americas, Europe, and Asia Pacific, active discussions are under way between the government and private sectors to shape regulation. In all corners of the world—including South Africa, Italy, Brazil, and China—public providers are racing to build “national” clouds in advance of evolving privacy laws. Region-specific timeframes and barriers reflect these considerations, indicating either the expected window for investments and policies to mature or a cautious buffer due to the complexities involved.
Where do you start?
For companies looking to boost data management capabilities, the holy grail is creating the architecture and processes required to handle growing volumes of data in an agile, efficient fashion. Yet for many organizations, the distance between current capabilities and that goal may seem daunting. The following steps can help you lay the groundwork for the journey ahead:
- Pay data debt. CIOs think a lot about technical debt—the quick fixes, workarounds, and delayed upgrades that bedevil legacy systems and undermine efficiency. Many companies face comparable challenges with data debt. Consider the amount of money you are spending on one-off data repositories—or the cost, in terms of both time and efficiency, of creating reports manually. A first step in transforming your data management systems is assessing (broadly) just how much data sprawl you have. How many interfaces and feeds connect disparate repositories and systems? With an inventory of systems and data, you can try to quantify how much manual effort is expended daily/monthly/yearly to keep the sprawl intact and functioning. This information will help you better understand your current data capacity, efficiency (or lack thereof), and costs, and provide a baseline for further analysis.
- Start upstream. Data scientists use technologies such as text and predictive analytics and machine learning to analyze largely unstructured data. This process typically begins at the end of the information supply chain—the point at which users tap into data that has been aggregated. By deploying these and other technologies at the beginning of the information supply chain—where an organization initially ingests raw data—companies can start the process of linking, merging and routing data, and cleansing bad data before data scientists and users begin working with it. This approach helps impose some structure by creating linkages within raw data early on, laying the groundwork for greater storage and management efficiencies. Also, when you can improve data quality at the point of entry by correlating it and performing relationship analysis to provide more context, data scientists will likely end up spending less time organizing data and more time performing advanced analysis.
- Use metadata, and lots of it. Adding metadata to raw data at the point of ingestion can help enhance data context, particularly in unstructured data such as random documents, newsfeeds, and social media. Greater context, in turn, can help organizations group and process thematically similar information more efficiently, as well as enable increased process automation.
- Create a cognitive data steward. Raw data is anything but uniform. Any raw data set is likely rife with misspellings, duplicate records, and inaccuracies. Typically, data stewards manually examine problematic data to resolve issues and answer questions that may arise during analysis. Increasingly, we see data stewards use advanced cognitive computing technologies to “assist” in this kind of review—there’s only so much a human can do to resolve these issues. The ability to automate this process can free up data stewards to focus on more valuable tasks.
- Help users explore data more effectively. Navigating and exploring data can be challenging, even for experienced users. Providing a natural language interface and cognitive computing tools to help guide users as they undertake predictive modeling and advanced searches can turn laymen into data scientists—and help companies extract more value from their data management investments.
As data grows exponentially in both volume and strategic importance, enterprise data sovereignty offers companies a blueprint for transforming themselves into data-driven organizations. Achieving this goal may require long-term investments in data integration, cataloging, security, lineage, and other areas. But with focus and careful planning, such investments can generate ongoing ROI in the form of a dynamic data management construct that is constantly evolving, learning, and growing.