Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Optimal algorithms for tree partitioning
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Scalable parallel geometric algorithms for coarse grained multicomputers
SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An array-based algorithm for simultaneous multidimensional aggregates
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient external memory algorithms by simulating coarse-grained parallel algorithms
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
BSPlib: The BSP programming library
Parallel Computing
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
A Shifting Algorithm for Min-Max Tree Partitioning
Journal of the ACM (JACM)
Iceberg-cube computation with PC clusters
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
High Performance OLAP and Data Mining on Parallel Computers
Data Mining and Knowledge Discovery
Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Fast Computation of Sparse Datacubes
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
BSP-Like External-Memory Computation
CIAC '97 Proceedings of the Third Italian Conference on Algorithms and Complexity
Bulk synchronous parallel computing-a paradigm for transportable software
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Supporting I/O-efficient scientific computation in TPIE
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
A Parallel Scalable Infrastructure for OLAP and Data Mining
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Future Generation Computer Systems - Selected papers from CCGRID 2002
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors
Distributed and Parallel Databases
Communication and Memory Optimal Parallel Data Cube Construction
IEEE Transactions on Parallel and Distributed Systems
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP
Distributed and Parallel Databases
PnP: sequential, external memory, and parallel iceberg cube computation
Distributed and Parallel Databases
Enabling OLAP in mobile environments via intelligent data cube compression techniques
Journal of Intelligent Information Systems
Parallel OLAP with the Sidera server
Future Generation Computer Systems
A cubic-wise balance approach for privacy preservation in data cubes
Information Sciences: an International Journal
Sidera: a cluster-based server for online analytical processing
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Developing high-performance parallel applications using EPAS
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Parallel Real-Time OLAP on Multi-core Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A New Parallel Data Cube Construction Scheme
International Journal of Grid and High Performance Computing
Hi-index | 0.01 |
This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p.