PnP: sequential, external memory, and parallel iceberg cube computation

Authors:
Ying Chen;Frank Dehne;Todd Eavis;Andrew Rau-Chaplin
Affiliations:
Microsoft Corp., Redmond, USA;School of Computer Science, Carleton University, Ottawa, Canada;Department of Computer Science, Concordia University, Montreal, Canada;Faculty of Computer Science, Dalhousie University, Halifax, Canada
Venue:
Distributed and Parallel Databases
Year:
2008

Citing 24
Cited 1

Parallel database systems: the future of high performance database systems

Communications of the ACM
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
High performance multidimensional analysis of large datasets

Proceedings of the 1st ACM international workshop on Data warehousing and OLAP
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A dynamic load balancing strategy for parallel datacube computation

Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP
Iceberg-cube computation with PC clusters

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Parallelizing the Data Cube

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
High Performance OLAP and Data Mining on Parallel Computers

Data Mining and Knowledge Discovery
Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation

Distributed and Parallel Databases
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Cluster Architecture for Parallel Data Warehousing

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
A Parallel Scalable Infrastructure for OLAP and Data Mining

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Condensed Cube: An Efficient Approach to Reducing Data Cube Size

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Distributed and Parallel Databases
Building Large ROLAP Data Cubes in Parallel

IDEAS '04 Proceedings of the International Database Engineering and Applications Symposium
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star-cubing: computing iceberg cubes by top-down and bottom-up integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Parallel Real-Time OLAP on Multi-core Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.