CARTILAGE: adding flexibility to the Hadoop skeleton

Authors:
Alekh Jindal;Jorge Quiané-Ruiz;Samuel Madden
Affiliations:
MIT, Cambridge, MA, USA;QCRI, Doha, Qatar;MIT, Cambridge, MA, USA
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 11
Cited 1

A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce

Proceedings of the VLDB Endowment
Column-oriented storage techniques for MapReduce

Proceedings of the VLDB Endowment
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Full-text indexing for optimizing selection operations in large-scale data analytics

Proceedings of the second international workshop on MapReduce and its applications
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Incoop: MapReduce for incremental computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
Trojan data layouts: right shoes for a running elephant

Proceedings of the 2nd ACM Symposium on Cloud Computing
Only aggressive elephants are fast elephants

Proceedings of the VLDB Endowment

Mosquito: another one bites the data upload stream

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern enterprises have to deal with a variety of analytical queries over very large datasets. In this respect, Hadoop has gained much popularity since it scales to thousand of nodes and terabytes of data. However, Hadoop suffers from poor performance, especially in I/O performance. Several works have proposed alternate data storage for Hadoop in order to improve the query performance. However, many of these works end up making deep changes in Hadoop or HDFS. As a result, they are (i) difficult to adopt by several users, and (ii) not compatible with future Hadoop releases. In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS. CARTILAGE allows users full control over their data storage, including data partitioning, data replication, data layouts, and data placement. Furthermore, CARTILAGE can be layered on top of an existing HDFS installation. This means that Hadoop, as well as other query engines, can readily make use of CARTILAGE. We describe several use-cases of CARTILAGE and propose to demonstrate the flexibility and efficiency of CARTILAGE through a set of novel scenarios.