A taxonomy of ETL activities

  • Authors:
  • Panos Vassiliadis;Alkis Simitsis;Eftychia Baikousi

  • Affiliations:
  • Univ. of Ioannina, Ioannina, Greece;HP, Palo Alto, CA, USA;Univ. of Ioannina, Ioannina, Greece

  • Venue:
  • Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extract-Transform-Load (ETL) activities are software modules responsible for populating a data warehouse with operational data, which have undergone a series of transformations on their way to the warehouse. The whole process is very complex and of signifi-cant importance for the design and maintenance of the data ware-house. A plethora of commercial ETL tools are already available in the market. However, each one of them follows a different ap-proach for the modeling of ETL activities; i.e., of the building blocks of an ETL workflow. As a result, so far there is no standard or unified approach for describing such activities. In this paper, we are working towards the identification of generic properties that characterize ETL activities. In doing so, we follow a black-box approach and provide a taxonomy that characterizes ETL activities in terms of the relationship of their input to their output and provide a normal form that is based on interpreted semantics for the black box activities. Finally, we show how the proposed taxonomy can be used in the construction of larger modules, i.e., ETL archetype patterns, which can be used for the composition and optimization of ETL workflows.