Lineage Tracing in a Data Warehousing System

  • Authors:
  • Affiliations:
  • Venue:
  • ICDE '00 Proceedings of the 16th International Conference on Data Engineering
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. In many cases, the warehouse view contents alone are not sufficient for in-depth analysis.It is often useful to be able to "drill through" from interesting (or potentially erroneous) view data to the original source data that derived the view data. For a given view data item, identifying the exact set of base data items that produced the view data item is termed the view data lineage problem.Motivation for and applications of lineage tracing in a warehousing environment are provided in [2, 3]. In the context of the WHIPS data warehousing project at Stanford [4], we have developed a system that performs efficient and consistent lineage tracing. Some commercial data warehousing systems support schema-level lineage tracing, or provide specialized drill-down and/or drill-through facilities for multi-dimensional warehouse views.Our lineage tracing system supports more fine-grained instance-level lineage tracing for arbitrarily complex relational views, including aggregation. At view definition time, our system automatically generates lineage tracing procedures and supporting auxiliary views. At lineage tracing time, the system applies the tracing procedures to the source tables and/or auxiliary views to obtain the lineage results and to illustrate the specific view data derivation process.