Architecture of Parallel Spatial Data Warehouse: Balancing Algorithm and Resumption of Data Extraction

Authors:
Marcin Gorawski
Affiliations:
Silesian University of Technology, Institute of Computer Science, Akademicka 16, 44-100 Gliwice, Poland, e-mail: Marcin.Gorawski@polsl.pl
Venue:
Proceedings of the 2005 conference on Software Engineering: Evolution and Emerging Technologies
Year:
2005

Citing 14
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A dynamic load balancing strategy for parallel datacube computation

Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP
Efficient resumption of interrupted warehouse loads

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems

Distributed and Parallel Databases
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Parallel Star Join + DataIndexes: Efficient Query Processing in Data Warehouses and OLAP

IEEE Transactions on Knowledge and Data Engineering
Selective Materialization: An Efficient Method for Spatial Data Cube Construction

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Dynamic Query Scheduling in Parallel Data Warehouses

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Efficient OLAP Operations in Spatial Data Warehouses

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Parallel Multi-Dimensional ROLAP Indexing

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Parallel ROLAP Data Cube Construction On Shared-Nothing Multiprocessors

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Spatial hierarchy and OLAP-favored search in spatial data warehouse

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Range Aggregate Processing in Spatial Databases

IEEE Transactions on Knowledge and Data Engineering
FAS: a freshness-sensitive coordination middleware for a cluster of OLAP components

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a Parallel Spatial Data Warehouse (PSDW) system that we use for aggregation and analysis of huge amounts of spatial data. The data is generated by utilities meters communicating via radio. The PSDW system is based on a data model called the cascaded star model. In order to provide satisfactory interactivity for PSDW system, we used parallel computing supported by a special indexing structure called an aggregation tree. The balancing of a PSDW system workload is very essential to ensure the minimal response time of tasks submitted to process. We have implemented two data partitioning schemes which use Hilbert and Peano curves for space ordering. The presented balancing algorithm iteratively calculates optimal size of partitions, which are loaded into each node, by executing a series of aggregations on a test data set. We provide a collection of system tests results and its analysis that confirm the possibility of a balancing algorithm realization in proposed way. During ETL process (Extraction, Transformation and Loading) large amounts of data are transformed and loaded to PSDW. ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we analyze the influence of the data balancing used in PSDW on the extraction and resumption processes efficiency.