Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce

Authors:
Suan Lee;Jinho Kim;Yang-Sae Moon;Wookey Lee
Affiliations:
Department of Computer Science, Kangwon National University, Kangwon, Korea;Department of Computer Science, Kangwon National University, Kangwon, Korea;Department of Computer Science, Kangwon National University, Kangwon, Korea;Department of Industrial Engineering, Inha University, Incheon, Korea
Venue:
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Year:
2012

Citing 10
Cited 0

Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Iceberg-cube computation with PC clusters

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Distributed and Parallel Databases
ROLAP implementations of the data cube

ACM Computing Surveys (CSUR)
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
A Parallel Algorithm for Closed Cube Computation

ICIS '08 Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)
A MapReduceMerge-based Data Cube Construction Method

GCC '10 Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing
Distributed cube materialization on holistic measures

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The computation of multidimensional OLAP(On-Line Analytical Processing) data cube takes much time, because a data cube with D dimensions consists of 2D cuboids. To build ROLAP(Relational OLAP) data cubes efficiently, existing algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc) use several strategies sharing sort cost and input data scan, reducing data computation, and utilizing parallel processing techniques. On the other hand, MapReduce is recently emerging for the framework processing a huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies. In this paper, we propose a distributed parallel processing algorithm, called MRPipeLevel, which takes advantage of the MapReduce framework. It is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. The proposed MRPipeLevel algorithm parallelizes cube computation and reduces the number of data scan by pipelining at the same time. We implemented and evaluated the proposed algorithm under the MapReduce framework. Through the experiments, we also identify factors for performance enhancement in MapReduce to process very huge data.