Architecture of Parallel Spatial Data Warehouse: Balancing Algorithm and Resumption of Data Extraction

  • Authors:
  • Marcin Gorawski

  • Affiliations:
  • Silesian University of Technology, Institute of Computer Science, Akademicka 16, 44-100 Gliwice, Poland, e-mail: Marcin.Gorawski@polsl.pl

  • Venue:
  • Proceedings of the 2005 conference on Software Engineering: Evolution and Emerging Technologies
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a Parallel Spatial Data Warehouse (PSDW) system that we use for aggregation and analysis of huge amounts of spatial data. The data is generated by utilities meters communicating via radio. The PSDW system is based on a data model called the cascaded star model. In order to provide satisfactory interactivity for PSDW system, we used parallel computing supported by a special indexing structure called an aggregation tree. The balancing of a PSDW system workload is very essential to ensure the minimal response time of tasks submitted to process. We have implemented two data partitioning schemes which use Hilbert and Peano curves for space ordering. The presented balancing algorithm iteratively calculates optimal size of partitions, which are loaded into each node, by executing a series of aggregations on a test data set. We provide a collection of system tests results and its analysis that confirm the possibility of a balancing algorithm realization in proposed way. During ETL process (Extraction, Transformation and Loading) large amounts of data are transformed and loaded to PSDW. ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we analyze the influence of the data balancing used in PSDW on the extraction and resumption processes efficiency.