Parallelizing the Data Cube

Authors:
Frank K. H. A. Dehne;Todd Eavis;Susanne E. Hambrusch;Andrew Rau-Chaplin
Affiliations:
-;-;-;-
Venue:
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Year:
2001

Citing 22
Cited 4

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Optimal algorithms for tree partitioning

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Scalable parallel geometric algorithms for coarse grained multicomputers

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Towards efficiency and portability: programming with the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient external memory algorithms by simulating coarse-grained parallel algorithms

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
External memory algorithms

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
BSPlib: The BSP programming library

Parallel Computing
Parallel virtual memory

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
A Shifting Algorithm for Min-Max Tree Partitioning

Journal of the ACM (JACM)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
High Performance OLAP and Data Mining on Parallel Computers

Data Mining and Knowledge Discovery
Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
BSP-Like External-Memory Computation

CIAC '97 Proceedings of the Third Italian Conference on Algorithms and Complexity
Bulk synchronous parallel computing-a paradigm for transportable software

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Supporting I/O-efficient scientific computation in TPIE

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
A Parallel Scalable Infrastructure for OLAP and Data Mining

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications

Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Computing Partial Data Cubes for Parallel Data Warehousing Applications

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
New Algorithm for Computing Cube on Very Large Compressed Data Sets

IEEE Transactions on Knowledge and Data Engineering
Comparing GPU and CPU in OLAP cubes creation

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter-processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel as is done in previous parallel approaches. In fact, after the initial load distribution phase, each processor can compute its assigned subcube without any communication with the other processors. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting. The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. Experimental results presented show that our partitioning strategies generate a close to optimal load balance between processors.