Implementing a data lineage tracker

  • Authors:
  • Colin Puri;Doo Soon Kim;Peter Z. Yeh;Kunal Verma

  • Affiliations:
  • Accenture Technology Labs, San Jose, California;Accenture Technology Labs, San Jose, California;Accenture Technology Labs, San Jose, California;Accenture Technology Labs, San Jose, California

  • Venue:
  • DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Everyday business users face the tracking of the origin of information used in calculations and business decisions. Knowing the origin and lineage of data can help in the decision making process, provide a clear audit trail for regulation, and answer key questions such as: who, what, where, when, why, and how. In the case of tracking data lineage, many issues and challenges arise in trying to track and support a heterogeneous enterprise environment. This paper presents one method of tackling data lineage to answer the questions needed for business users, for both new and old applications in a heterogeneous infrastructure environment. Using trace logs from data sources, we show how our system performs by effectively tracking data lineage and determining data flows of information as it moves from one data source to another through the execution of applications. Utilizing SQL and NoSQL systems, we demonstrate the recall and precision of our proposed data lineage tracking system.