Data lineage: Data origination and where it moves over time Bookmark has been added
Data lineage: Data origination and where it moves over time
How to trace your data journey and improve the quality of your reports
Regulations such as BCBS#239, GDPR and Solvency II force financial institutions to provide insights into their (risk) data aggregation processes. In this article we elaborate on the concept of metadata management and data lineage and how it can help your organization with providing a better understanding and transparency of the quality of your data flows. We explain how data lineage enables a better control process, decreases error solving costs and provides confidence in the reported figures towards internal management and supervisory bodies.
The high number of systems, data transformations and extensive data and system legacy make it challenging for financial institutions to provide regulators with the required level of transparency on their data integration- and aggregation processes used for reporting. Data lineage provides insights into which business and technical transformation logic has been applied to your data and by whom.
What is data lineage?
In our previous article we elaborated on the definition and usage of metadata and its increasing importance in the fast changing regulatory environment.
In general, metadata is the prerequisite for proper data lineage which by the international association of data management professionals (DAMA, 2017) is defined as “the pathway along which data moves from its point of origin to its point of usage”.1 It also describes what happens to data as it flows through the organization.
Figure 1 demonstrates a typical data-driven reporting environment where source data is transformed in such a way it is accepted by the data model, which in turn is deployed by translation of reporting requirements into a semantic layer. Three types of data lineage are considered when aiming for a data-driven reporting process:
- Vertical lineage demonstrates the origination of a data requirement from regulations towards deployment in a data model on a metadata level.
- Horizontal lineage shows the mapping of source data to target output on a metadata level. It shows the functional transformation logic of how source data is transformed towards a target end state.
- Physical lineage demonstrates the actual data flow from source system to reporting solution, supporting the metadata architecture of the data driven reporting environment.
Why is data lineage important?
Data lineage has many applications which are important in order to release the full potential of data-driven reporting process:
- Better anticipation on changing regulatory (reporting) requirements (vertical lineage)
The frequently changing reporting framework financial institutions need to comply which are impacting the data flows used for reporting. The internal data landscape of financial institutions also evolves. Data lineage provides insights into the potential impact of the changes on your data processes. This will help your organization to anticipate prior to changes, which will bring a reduction in error-solving costs and an increase in efficiency.
- Unpacking the “black box” data processes usually are (horizontal lineage)
With reporting becoming more data driven, collaboration between Finance-, Risk- and IT- departments of your organization becomes even more crucial. Visualization of data lineage can uncover the “black box” of data flows and thus, create a greater transparency to all business users.
- Insight in the reported figures back to granular source data (physical lineage)
In the reporting environment there is a shift from report creation with aggregated data towards reports using granular (or detailed) source data. Financial institutions face challenges to trace final reporting figures back to the initial source of the data and to identify the applied transformation logic along the way.
Transparency and visualization of your data flows enable reporting specialists, business analysts and data specialists to work together and understand each other. It enables an efficient “search for data” in your organization, resulting in cost reductions and a shorter lead time of error solving and change requests.
When implemented and used correctly, data lineage may enhance the control on your data transformation- and reporting processes. Which in turn will result in improved quality of your data and reports and a eases the discussion with the supervisory bodies.
Data lineage implementation considerations of business benefits?
A prerequisite for data lineage is a data governance framework which includes a well-defined metadata strategy and ownership of metadata within your organization. Thereby the focus should be on metadata being available, accurate and complete. A next step is identification and connection of all your data and its metadata from source to transformation layer to reporting solution.
Metadata identification and management processes are currently known as being labour intensive, inefficient and expensive. Robotics and Artificial Intelligence (AI) can play a major role in metadata-tagging and generation of metadata. While manual metadata-tagging can be an expensive process. It can cost (on average) € 2 - € 5 per metadata item.2 On the other hand, AI can reduce this amount 10x while at the same time improves the overall quality of your metadata and thus data lineage.
Therefore, automation of real-time identification and connection of metadata as well as the visualization of data lineage will enhance many data intensive reporting processes in your organization, while also significantly reduces the costs associated to it.
This in turn will lay the cornerstone for your organization to become future-proof and a truly data-driven organization eventually. In our next article we will elaborate on the application of these automation techniques in the world of metadata management, data lineage and how this can help your organization in improving the reporting processes.
DAMA International, Data Management Body Of Knowledge, page 28, 2nd edition, 2017
2 EDIA (2018), Why is automated metadata-tagging better than manual tagging?
Would you like to know more about Metadata Management or Data Lineage? Please contact Yuri Jolly or Pim Wesselink via the details below.