Generating data quality rules and integration into ETL process

  • Authors:
  • Jasna Rodic;Mirta Baranovic

  • Affiliations:
  • Oracle Croatia, Zagreb, Croatia;University of Zagreb, Zagreb, Croatia

  • Venue:
  • Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many data quality projects are integrated into data warehouse projects without enough time allocated for the data quality part, which leads to a need for a quicker data quality process implementation that can be easily adopted as the first stage of data warehouse implementation. We will see that many data quality rules can be implemented in a similar way, and thus generated based on metadata tables that store information about the rules. These generated rules are then used to check data in designated tables and mark erroneous records, or to do certain updates of invalid data. We will also store information about the rules violations in order to provide analysis of such data. This could give a significant insight into our source systems. Entire data quality process will be integrated into ETL process in order to achieve load of data warehouse that is as automated, as correct and as quick as possible. Only small number of records would be left for manual inspection and reprocessing.