The power of knowledge graphs: Tracking data lineage in financial services has been saved
The power of knowledge graphs: Tracking data lineage in financial services
Efficiently expand your data lineage insights
A common issue today’s data-driven financial institutions face is tracking data throughout its development lifecycle. You need to track data as it is utilised, manipulated and transformed throughout the organisation. Knowledge graphs are a powerful tool to trace your data journey in the right context, to identify missing links and provide smart recommendations while improving the quality and auditability of data flows.
What is data lineage?
DAMA, the global data management community, defines data lineage as “the pathway along which data moves from its point of origin to its point of usage, sometimes called the data chain. Understanding the data lineage requires documenting the origin of data sets, as well as their movement and transformation through systems where they are accessed and used.”
Regulations such as BCBS#239, GDPR and Solvency II force financial institutions to provide insights into their (risk) data-aggregation processes. For these processes many different users, systems, data transformations, and extensive data and system legacy are involved. Therefore, making it challenging for you to provide regulators and management with the required level of transparency on the data integration and aggregation processes used for risk modelling and regulatory reporting.
Data lineage provides insights into which business and technical transformation logic has been applied to your data, in which layer and by whom. In a previous article, we elaborated on the importance of data lineage for financial institutions and discussed different types, such as horizontal and vertical lineage.
One of the main challenges for modelling data lineage, and extracting insights from it, is dealing with complex data that is:
- Highly connected. During its journey from requirements to specifications and delivery, multiple transformations and mappings take place between data elements. This creates a complex network of data elements. The deeper the understanding of these relationships, the more powerful the insights for data lineage.
- Diverse and heterogeneous. Depending on how an organisation is structured and managed, data is fed from multiple sources and channels, each with its own data model and format. Diversity of data adds a new dimension to complexity of data lineage, as it has an impact on the overall management of how data is gathered, analysed and integrated.
- Dynamic. As organisations are modelling their data lineage, the data artefacts, their underlying model and links between them can change and evolve. Dealing with the history and evolution of data is another driver of complexity for realising up-to-date data lineage.
- Contextual. Only when data is presented within context does it become meaningful for data lineage. Therefore, metadata is a key ingredient and prerequisite for proper data lineage. Knowing about the semantics and meaning of data during its lifecycle provides more automation possibilities for modelling and interpreting data lineage.
Knowledge graphs tackle the above challenges by allowing for scalable processing of interconnected data and enabling semantic data representation and integration.
What is a knowledge graph?
A knowledge graph is a means to connect and represent knowledge in an area of interest using a graph-like structure. It is typically built on top of existing databases to link data at web scale, combining structured and/or unstructured information. As opposed to the more commonly used relational data models, a graph model is built as a collection of concepts or entities, and the relationships between them.
Knowledge graphs allow the processing and representation of data and knowledge in a format that is very close to the way a human brain processes and stores information. They act as hybrid technology combining database management, network analysis and AI for:
- bridging diverse data silos regardless of data formats, serialisations, conceptualisations and technology ecosystems
- investigating interconnected data to discover insightful patterns
- deriving context-relevant knowledge from the large amounts of integrated data.
Knowledge graphs form the foundation of many modern data-integration and analytics systems. Gartner predicts that by 2025, graph technologies will be used in 80 per cent of data and analytics innovations, up from 10 per cent in 2021, facilitating rapid decision-making across the organisation. Also, looking at Figure 1, graph technologies are listed as a critical enabler, with a wide range of potential applications that will take three to six years to reach majority adoption. “Critical enablers act as an additive force to bring the emerging technologies and trends together, and heighten the benefits by reshaping business practices, processes, methods, models and/or functions in markets where they are applied.”1
In a recent article, we elaborated on different use cases of knowledge graphs for financial services.
How can knowledge graphs support data lineage?
Knowledge graphs enable wider and deeper data-lineage insights by tackling the challenges of dealing with complex data:
- Highly connected. Graph technologies put data relationships at the centre. Knowledge graph storage engines store and index data points and connections between them efficiently, optimising them for querying and analytics on end-to-end data lineage. Different kinds of knowledge graphs-based analysis allow you to identify patterns in interconnected data. Methods such as, path analysis, connectivity analysis, community analysis, and centrality analysis are applied to the underlying graph of data lineage.
- Diverse and heterogeneous. Knowledge graphs allow data distribution, harmonisation, integration and storage at scale when dealing with diverse data sources in data lineage (see Figure 2). Linked data standards, such as URIs and the RDF data model, are used to represent data in a single interchangeable format that is understood by machines and humans. Additionally, multi-model graph databases are incorporated to support multiple data models against a single integrated backend.
- Dynamic. Knowledge graphs provide an agile and flexible data-management model to bring together large volumes of data in a variety of forms. As data and connections evolve over time, the topology of data lineage graphs transforms accordingly. The relationship-first nature of graphs allows the efficient updating of references to data.
- Contextual. Incorporated semantics technologies allow for storing data lineage in a rich construct, contextually and conceptually. Ontologies, vocabularies and other kinds of knowledge-representation techniques help manage metadata efficiently across multiple sources, while allowing for reasoning and automated knowledge discovery throughout the end-to-end data lineage.
Figure 2: The integration of diverse data silos in knowledge graph- based data lineage
At Deloitte, we are already supporting our Financial Services clients with a knowledge graph-based platform to extract insights from end-to-end data lineage. This overlay dashboard is built on top of existing sources and applications, such as Collibra, for example, and used for horizontal and vertical data lineage to support:
- checking the compliance with policies and regulatory requirements
- tracking progress end-to-end and over time from design towards realisation
- performing advanced data analytics
1. Identifying dependencies and redundancies among data points
2. Identifying gaps and missing links, with recommendations on
how to fix them
3. Identifying critical data elements within the data-delivery chain.
Look out for our next article, where we will elaborate on the above points, as well as discuss different features of our Deloitte Data Insights Monitor Solution.
A blog series with three different insights into Data Quality Management
How to trace your data journey and improve the quality of your reports