Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation

Authors:
Hongjun Lu;Jeffrey Xu Yu;Ling Feng;Zhixian Li
Affiliations:
School of Computing, The National University of Singapore, Singapore, Republic of Singapore. luhj@comp.nus.edu.sg;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, People's Republic of China. yu@se.cuhk.edu.hk;InfoLab, Tilburg University, The Netherlands. ling@kub.nl;School of Computing, The National University of Singapore, Singapore, Republic of Singapore. lizhixia@comp.nus.edu.sg
Venue:
Distributed and Parallel Databases
Year:
2003

Citing 13
Cited 3

Bucket spreading parallel hash: a new, robust, parallel hash join method for data skew in the super database computer (SDC)

Proceedings of the sixteenth international conference on Very large databases
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Query Processing in Parallel Relational Database Systems

Query Processing in Parallel Relational Database Systems
Scheduling and Load Balancing in Parallel and Distributed Systems

Scheduling and Load Balancing in Parallel and Distributed Systems
Sampling Issues in Parallel Database Systems

EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Distributed and Parallel Databases
PnP: sequential, external memory, and parallel iceberg cube computation

Distributed and Parallel Databases
A New Parallel Data Cube Construction Scheme

International Journal of Grid and High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.