Iceberg-cube computation with PC clusters

Authors:
Raymond T. Ng;Alan Wagner;Yu Yin
Affiliations:
Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC;Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC;Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
Venue:
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Year:
2001

Citing 17
Cited 18

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
High Performance OLAP and Data Mining on Parallel Computers

Data Mining and Knowledge Discovery
Parallel Formulations of Decision-Tree Classification Algorithms

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Index Selection for OLAP

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Materialized Views Selection in a Multidimensional Database

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Materialized View Selection for Multidimensional Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Explaining Differences in Multidimensional Aggregates

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Parallelizing the Data Cube

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Parallel data intensive computing in scientific and commercial applications

Parallel Computing - Parallel data-intensive algorithms and applications
Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation

Future Generation Computer Systems - Selected papers from CCGRID 2002
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Distributed and Parallel Databases
PnP: Parallel and External Memory Iceberg Cube Computation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Parallel querying of ROLAP cubes in the presence of hierarchies

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Distributed and Parallel Databases
The generalized MDL approach for summarization

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in streaming data

ACM Transactions on Knowledge Discovery from Data (TKDD)
PnP: sequential, external memory, and parallel iceberg cube computation

Distributed and Parallel Databases
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases

ACM Transactions on Knowledge Discovery from Data (TKDD)
Answering aggregate keyword queries on relational databases using minimal group-bys

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Strategies for complex data cube queries

Applied Intelligence
Comparing GPU and CPU in OLAP cubes creation

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Dynamic construction of user defined virtual cubes

NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems
Parallel Real-Time OLAP on Multi-core Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
A New Parallel Data Cube Construction Scheme

International Journal of Grid and High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.