ETL Workflow Analysis and Verification Using Backwards Constraint Propagation

  • Authors:
  • Jie Liu;Senlin Liang;Dan Ye;Jun Wei;Tao Huang

  • Affiliations:
  • University of Science and Technology of China, Anhui Hefei, China and Institute of Software, Chinese Academy of Sciences, Beijing, China;Department of Computer Science, State University of New York at Stony Brook, Stony Brook, USA NY 11794;Institute of Software, Chinese Academy of Sciences, Beijing, China;Institute of Software, Chinese Academy of Sciences, Beijing, China;Institute of Software, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • CAiSE '09 Proceedings of the 21st International Conference on Advanced Information Systems Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

One major contribution of data warehouses is to support better decision making by facilitating data analysis, and therefore data quality is of primary importance. ETL is the process that extracts, transforms, and ultimately loads data into target warehouses. Although ETL workflows can be designed by ETL tools, data exceptions are largely left to human analysis and handled inadequately. Early detection of exceptions helps to improve the stability and efficiency of ETL workflows. To achieve this goal, a novel approach, Backwards Constraint Propagation (BCP), is proposed that automatically analyzes ETL workflows and verifies the target-end restrictions at their earliest points. BCP builds an ETL graph out of a given ETL workflow, encodes the target-end restrictions as integrity constraints, and propagates them backwards from target to sources through the ETL graph by applying constraint projection rules. It is showed that BCP supports most relational algebra operators and data transformation functions.