Communication and Memory Optimal Parallel Data Cube Construction

Authors:
Ruoming Jin;Karthikeyan Vaidyanathan;Ge Yang;Gagan Agrawal
Affiliations:
-;-;-;IEEE Computer Society
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2005

Citing 14
Cited 3

Introduction to algorithms

Introduction to algorithms
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Parallelizing the Data Cube

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Case for Parallelism in Data Warehousing and OLAP

DEXA '98 Proceedings of the 9th International Workshop on Database and Expert Systems Applications
Implementing Data Cube Construction using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Evaluation

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

ICDE '04 Proceedings of the 20th International Conference on Data Engineering

Sidera: a cluster-based server for online analytical processing

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
An incremental maintenance scheme of data cubes

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
A New Parallel Data Cube Construction Scheme

International Journal of Grid and High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results.