Skip to main content

How can data mesh be a solution to make data valorization use cases like AI integration easier ?

To the point

 

Current market repository solutions are creating problems for companies, reaching their limits as the global data volume continues to increase exponentially. As data warehouses and data lakes become more and more complex, data-driven decision making is impacting companies’ businesses.

A new architecture called data mesh has been defined to overcome these challenges by shifting the data responsibility to business domain teams and considering data as a product. This distributed architecture is rethinking the way data should be managed in an organization. Many benefits are provided by a data mesh or data fabric, as it effectively enables the use of distributed data by abstracting multi-cloud access. Through use of these design patterns, data could be accessed by, or shared with, both internal and external applications for a wide variety of analytical and operational use cases, increasing the versatility of an organization’s master data and ensuring data is semantically aligned across the company.

In the era of big data, the volume of data produced globally increases exponentially with each year. According to International Data Corporation forecasts, by 2025 the volume of data worldwide is expected to reach more than 175 zettabytes. To put this figure into perspective, if all this data were stored on DVDs and placed side-by-side, they would form a line circling the Earth 222 times1. Companies produce a vast quantity and scope of data, and when managed effectively it can generate powerful business insights and bring a competitive advantage. The key to maximizing the data’s potential is having easy access to clear and coherent data. Having the right data architecture and data governance in place is an essential starting point to ensuring consistent data quality and brings the potential to transform data into a highly versatile product.

Data mesh continues to gain popularity with companies because its architectural design helps to overcome the pitfalls of current data repository solutions. By taking advantage of data mesh, you could make a bigger impact on your business. So, what is data mesh, what are its benefits, and how does it differentiate from current data architecture solutions?

Current companies’ data architecture

The most well-known platforms for centralizing data from multiple sources are data repository solutions called data Warehouses and data lakes. Data warehouses (DWH) are a reliable data management system because they aggregate large volumes of data from multiple sources into a single repository. Data is structured, historically unified and ready to use2. In contrast, a data lake contains raw data that can be structured and unstructured. It provides a massive data store, but it takes time to retrieve data because data lakes have a flat architecture3.

These repository solutions were adopted by many companies at the beginning of the big data era to develop their business intelligence and help in decision making. While these platforms play an essential role for storing and analyzing data, they are reaching their limits as data sources and volume are increasing. It has become unrealistic to integrate everything into a single data platform, hence the idea of unifying data sources under a common semantic umbrella.

The limitations of data warehouses and data lakes

As the volume of available data is expanding, data repository solutions are becoming increasingly complex. Within the corporate context, creating data products in compliance with organizational and regulatory standards becomes a time consuming and arduous activity. A more complex structure also decreases scalability and agility, which is problematic within a context where decisions need to be made quickly4. Furthermore, new types of data sources are emerging every day and need to be captured and understood in order to leverage their potential.

When leveraging data using these centralized platforms, the duty to ingest, transform and deliver data to the different business teams falls on the central IT organization. When business domain data owners circumvent IT, or miscommunication leads to the use of “shadow IT,” more disparate data sources are created that are non-compliant with internal processes 4,5. New architectures like data mesh or data fabric were developed, based on the distributed nature of data governance, to overcome some of these shortcomings.

What is data mesh and how does it help to overcome challenges?

Data mesh has gained popularity since being introduced in 20195, due to its new way of managing data through a democratized approach backed by a centralized, self-service infrastructure. Its main objective is to build business data products without specifying the technology involved 6. This is facilitated by three layers:

Figure 1: Layers of data mesh architecture and their benefits

Data mesh architecture is based on four principles which are designed to overcome the disadvantages of other types of data repositories7, 8, 9:

  • Domain ownership: The responsibility of providing compliant, accurate data is distributed across different domain experts rather than using a centralized data team.

    Benefits: Appointing ownership to the one familiar with data facilitates expertise of the data to integrate domain knowledge to use cases relevant to the business need. This streamlines the responsibility and authority to make decisions and provides insight on how and where data should be used.

    Example: The sustainability department of an investment company provides data to Compliance for ESG reporting. As the compliance department is the end-user, they only consume the received data. The sustainability department is responsible for the data accuracy and access as they are the owners of that business domain.
  • Data as a product: Thinking of data as a product facilitates ready-to-use data best suited to the needs of end-users while meeting quality and service-level agreements criteria.

    Benefits: It has high-quality, understandable, and trustable data for analytics purposes.

    Example: Taking the previous example, the sustainability department will provide data with a package of capabilities (e.g., documentation, accesses, confidence in their data), and they are able to choose the most suitable way to distribute it across the company (files, database storage, etc.). Thus, end-users are in the best condition to consume data and can unlock its full potential.
  • Self-service data infrastructure platform: As each business unit is responsible for their data, the data platform provides them with standardized capabilities (pipelines, domain agnostic tools, etc.) to autonomously publish their data products.

    Benefits: Better agility, scalability, accessibility, and lower infrastructure cost.

    Example: Initially, the infrastructure needed to deploy data required specialized skills. Business teams were using different tools which led to a lack of interoperability. With a self-service data platform, a support is provided to simplify the provisioning workflow. Hence, the underlying complexity of the infrastructure is hidden and business teams can be autonomous.
  • Federated governance: This ensures compliance with regulations and company policies. Having a clear data governance model that specifies n the format of all data products provided by business teams is essential because it gives more uniformity and clarity through standardized processes. Federated governance also ensures that those with the domain expertise are responsible for the data which is being leveraged across an organization.

    Benefits: There is better interoperability and simple data correlation across departments.

    Example: You are working for the Sales department and would like to use data from the Marketing team. As data follows the same guidelines across the company, the data analysis is straightforward.

Implementing these four principles and considering data as a group of repositories containing data products, data mesh offers concrete solutions through restructuring, therefore alleviating companies’ most pressing data architecture problems.

Barriers to overcome

Despite these benefits, one should be aware of potential barriers to overcome before switching to a data mesh. To improve its architecture and be more easily adopted by companies, the repository solution should address the following difficulties4:

  • Duplication of data across different domains: Data is reused between business domains which can lead to redundancy and increased management costs.
  • Cross domain analytics: Data model can lose interpretability as it splits to showcase all the business domains.
  • Implementation of federated data governance and quality adherence: Governance must be properly defined as it could result in technical gaps between data products.
  • Significant level of change management involved: Management should be well-informed; many necessary changes are needed to implement data mesh.
  • Technology choices shape overall data capabilities of the data platform: Technologies should be chosen widely to ensure enough resources for future applications to reduce potential ‘lock-in’ to a single provider and be flexible enough to accommodate architectural changes.

Data Valorization and AI integration through Data mesh implementation

Data mesh designs valorize data by enhancing its versatility across businesses. As it provides better interoperability, organizations should consider using agnostic products to ensure that they are not ‘locked-in’ to a single provider. As the flexibility to manage data increases, businesses can be extended by using new tools such as AI solutions. This will unlock many different possibilities which will help organizations to leverage their data and create a higher business impact through informed decision making.

References

[1] IDC, "The Digitization of the World - From Edge to Core," IDC, 11 2018. [Online]. Available: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf. [Accessed 10 06 2022].

[2] Qlik, "Data Warehouse," [Online]. Available: https://www.qlik.com/us/data-warehouse. [Accessed 18 05 2022].

[3] Qlik, "Data Lake," [Online]. Available: https://www.qlik.com/us/data-lake. [Accessed 18 05 2022].

[4] Deloitte, "From data mess to a data mesh," Deloitte, [Online]. Available: https://www2.deloitte.com/nl/nl/pages/strategy-analytics-and-ma/articles/from-data-mess-to-a-data-mesh.html. [Accessed 16 05 2022].

[5] Z. Dehghani, "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh," 20 05 2019. [Online]. Available: https://martinfowler.com/articles/data-monolith-to-mesh.html. [Accessed 16 05 2022].

[6] Gartner, "Quick Answer: Are Data Fabric and Data Mesh the Same or Different?," Gartner, 1 11 2021. [Online]. Available: https://www.gartner.com/doc/reprints?id=1-292DG4LD&ct=220209&st=sb&utm_campaign=TY%20Mailers&utm_medium=email&_hsmi=182672238&_hsenc=p2ANqtz--00qJgAzU26v3DoBBLqASGm_vJVdhGQV5gnAirC2zfIEj_o0wChJj9zj2wGnWiCV18YxKDIKMGFZDzhn6xkoGVW--VMw&utm_content=182672238. [Accessed 9 06 2022].

[7] Z. Dehghani, "Data Mesh Principles and Logical Architecture," martinFowler.com, 03 12 2020. [Online]. Available: https://martinfowler.com/articles/data-mesh-principles.html#DataAsAProduct. [Accessed 20 06 2022].

[8] J. Christ, L. Visengeriyeva and S. Harrer, "Data Mesh Architecture," [Online]. Available: https://www.datamesh-architecture.com/. [Accessed 17 05 2022].

[9] Starbust Data, "What is Data Mesh?," [Online]. Available: https://www.starburst.io/learn/data-fundamentals/what-is-data-mesh/. [Accessed 18 05 2022].