Influence of balancing used in a distributed data warehouse on the extraction process

Authors:
Marcin Gorawski;Pawel Marks
Affiliations:
Institute of Computer Science, Silesian University of Technology, Gliwice, Poland;Institute of Computer Science, Silesian University of Technology, Gliwice, Poland
Venue:
TEAA'05 Proceedings of the 31st VLDB conference on Trends in Enterprise Application Architecture
Year:
2005

Citing 4
Cited 0

Efficient resumption of interrupted warehouse loads

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
AJAX: an extensible data cleaning tool

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Striving towards Near Real-Time Data Integration for Data Warehouses

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
A framework for the design of ETL scenarios

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A data warehouse is filled with data during the extraction process. Such a process is sometimes interrupted by occurrence of a failure. After a failure the warehouse contains an incomplete data set, a part of the set is missing. To load the missing part of the data one of the interrupted extraction resumption algorithms is usually used. In this paper we analyze the influence of data balancing used in a distributed data warehouse on the efficiency of extraction and resumption processes. During resumption we base on the Design-Resume algorithm which imposes no additional overhead on an uninterrupted extraction process. We present how the balancing is done and examine its influence on the ETL process efficiency. Finally, basing on the results of performed tests, we discuss advantages and disadvantages of the balancing with respect to the ETL process.