Proceedings of the sixteenth international conference on Very large databases
Parallel database systems: the future of high performance database systems
Communications of the ACM
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
An array-based algorithm for simultaneous multidimensional aggregates
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Query Processing in Parallel Relational Database Systems
Query Processing in Parallel Relational Database Systems
Scheduling and Load Balancing in Parallel and Distributed Systems
Scheduling and Load Balancing in Parallel and Distributed Systems
Sampling Issues in Parallel Database Systems
EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP
Distributed and Parallel Databases
PnP: sequential, external memory, and parallel iceberg cube computation
Distributed and Parallel Databases
A New Parallel Data Cube Construction Scheme
International Journal of Grid and High Performance Computing
Hi-index | 0.00 |
Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory.