ETLDiff: a semi-automatic framework for regression test of ETL software

Authors:
Christian Thomsen;Torben Bach Pedersen
Affiliations:
Department of Computer Science, Aalborg University;Department of Computer Science, Aalborg University
Venue:
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Year:
2006

Citing 6
Cited 0

Extreme programming explained: embrace change

Extreme programming explained: embrace change
A framework for testing database applications

Proceedings of the 2000 ACM SIGSOFT international symposium on Software testing and analysis
Regression testing of database applications

Proceedings of the 2001 ACM symposium on Applied computing
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom

The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom
A Safe Regression Test Selection Technique for Database-Driven Applications

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
RelaXML: Bidirectional Transfer Between Relational and XML Data

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern software development methods such as Extreme Programming (XP) favor the use of frequently repeated tests, so-called regression tests, to catch new errors when software is updated or tuned, by checking that the software still produces the right results for a reference input. Regression testing is also very valuable for Extract–Transform–Load (ETL) software, as ETL software tends to be very complex and error-prone. However, regression testing of ETL software is currently cumbersome and requires large manual efforts. In this paper, we describe a novel, easy–to–use, and efficient semi–automatic test framework for regression test of ETL software. By automatically analyzing the schema, the tool detects how tables are related, and uses this knowledge, along with optional user specifications, to determine exactly what data warehouse (DW) data should be identical across test ETL runs, leaving out change-prone values such as surrogate keys. The framework also provides tools for quickly detecting and displaying differences between the current ETL results and the reference results. In summary, manual work for test setup is reduced to a minimum, while still ensuring an efficient testing procedure.