Building cubes with MapReduce

Authors:
Alberto Abelló;Jaume Ferrarons;Oscar Romero
Affiliations:
Universitat Politècnica de Catalunya, BarcelonaTech, Barcelona, Spain;Universitat Politècnica de Catalunya, BarcelonaTech, Barcelona, Spain;Universitat Politècnica de Catalunya, BarcelonaTech, Barcelona, Spain
Venue:
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Year:
2011

Citing 17
Cited 2

A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Building the Data Warehouse

Building the Data Warehouse
Corporate Information Factory

Corporate Information Factory
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The end of an architectural era: (it's time for a complete rewrite)

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
HDW: A High Performance Large Scale Data Warehouse

IMSCCS '08 Proceedings of the 2008 International Multi-symposiums on Computer and Computational Sciences
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data Warehouse Design: Modern Principles and Methodologies

Data Warehouse Design: Modern Principles and Methodologies
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment

DOLAP 2011: overview of the 14th international workshop on data warehousing and olap

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficiently compressing OLAP data cubes via R-tree based recursive partitions

ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g., BigTable) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Indeed, specific software tools to exploit a cloud are also here. The trend in this case is toward using tools based on the MapReduce paradigm developed by Google. In this paper, we explore the possibility of having data in a cloud by using BigTable to store the corporate historical data and MapReduce as an agile mechanism to deploy cubes in ad-hoc Data Marts. Our main contribution is the comparison of three different approaches to retrieve data cubes from BigTable by means of MapReduce and the definition of criteria to choose among them.