Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce

  • Authors:
  • Suan Lee;Jinho Kim;Yang-Sae Moon;Wookey Lee

  • Affiliations:
  • Department of Computer Science, Kangwon National University, Kangwon, Korea;Department of Computer Science, Kangwon National University, Kangwon, Korea;Department of Computer Science, Kangwon National University, Kangwon, Korea;Department of Industrial Engineering, Inha University, Incheon, Korea

  • Venue:
  • DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The computation of multidimensional OLAP(On-Line Analytical Processing) data cube takes much time, because a data cube with D dimensions consists of 2D cuboids. To build ROLAP(Relational OLAP) data cubes efficiently, existing algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc) use several strategies sharing sort cost and input data scan, reducing data computation, and utilizing parallel processing techniques. On the other hand, MapReduce is recently emerging for the framework processing a huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies. In this paper, we propose a distributed parallel processing algorithm, called MRPipeLevel, which takes advantage of the MapReduce framework. It is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. The proposed MRPipeLevel algorithm parallelizes cube computation and reduces the number of data scan by pipelining at the same time. We implemented and evaluated the proposed algorithm under the MapReduce framework. Through the experiments, we also identify factors for performance enhancement in MapReduce to process very huge data.