Towards generating ETL processes for incremental loading

  • Authors:
  • Thomas Jörg;Stefan Deßloch

  • Affiliations:
  • University of Kaiserslautern, Kaiserslautern, Germany;University of Kaiserslautern, Kaiserslautern, Germany

  • Venue:
  • IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extract, Transform, and Load (ETL) processes physically integrate data from multiple, heterogeneous sources in a central repository referred to as data warehouse. Physically integrated data gets stale when source data is changed, hence periodic refreshes are required. For efficiency reasons data warehouses are typically refreshed incrementally, i.e. changes are captured at the sources and propagated to the data warehouse on a regular basis. Dedicated ETL processes referred to as incremental load processes are employed to extract changes from the sources, propagate the changes, and refresh the data warehouse incrementally. Changes required in the data warehouse are inferred from changes captured at the sources during change propagation. The creation of incremental load processes is a complex task reserved to trained ETL programmers. In this paper we review existing Change Data Capture (CDC) techniques and discuss limitations of different approaches. We further review existing techniques for refreshing data warehouses. We then present an approach for generating incremental load processes from abstract schema mappings.