Resumption of data extraction process in parallel data warehouses

  • Authors:
  • Marcin Gorawski;Pawel Marks

  • Affiliations:
  • Institute of Computer Science, Silesian University of Technology, Gliwice, Poland;Institute of Computer Science, Silesian University of Technology, Gliwice, Poland

  • Venue:
  • PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume (DR) algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a parallel data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, which increases the efficiency of the resumption process. Based on the results of performed tests, the benefits of our improvements are discussed.