Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
High Performance OLAP and Data Mining on Parallel Computers
Data Mining and Knowledge Discovery
Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Materialized Views Selection in a Multidimensional Database
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Materialized View Selection for Multidimensional Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Explaining Differences in Multidimensional Aggregates
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Parallel data intensive computing in scientific and commercial applications
Parallel Computing - Parallel data-intensive algorithms and applications
Future Generation Computer Systems - Selected papers from CCGRID 2002
Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors
Distributed and Parallel Databases
PnP: Parallel and External Memory Iceberg Cube Computation
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Parallel querying of ROLAP cubes in the presence of hierarchies
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP
Distributed and Parallel Databases
The generalized MDL approach for summarization
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
PnP: sequential, external memory, and parallel iceberg cube computation
Distributed and Parallel Databases
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases
ACM Transactions on Knowledge Discovery from Data (TKDD)
Answering aggregate keyword queries on relational databases using minimal group-bys
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Strategies for complex data cube queries
Applied Intelligence
Comparing GPU and CPU in OLAP cubes creation
SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Dynamic construction of user defined virtual cubes
NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems
Parallel Real-Time OLAP on Multi-core Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
A New Parallel Data Cube Construction Scheme
International Journal of Grid and High Performance Computing
Hi-index | 0.00 |
In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.