CARTILAGE: adding flexibility to the Hadoop skeleton

  • Authors:
  • Alekh Jindal;Jorge Quiané-Ruiz;Samuel Madden

  • Affiliations:
  • MIT, Cambridge, MA, USA;QCRI, Doha, Qatar;MIT, Cambridge, MA, USA

  • Venue:
  • Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern enterprises have to deal with a variety of analytical queries over very large datasets. In this respect, Hadoop has gained much popularity since it scales to thousand of nodes and terabytes of data. However, Hadoop suffers from poor performance, especially in I/O performance. Several works have proposed alternate data storage for Hadoop in order to improve the query performance. However, many of these works end up making deep changes in Hadoop or HDFS. As a result, they are (i) difficult to adopt by several users, and (ii) not compatible with future Hadoop releases. In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS. CARTILAGE allows users full control over their data storage, including data partitioning, data replication, data layouts, and data placement. Furthermore, CARTILAGE can be layered on top of an existing HDFS installation. This means that Hadoop, as well as other query engines, can readily make use of CARTILAGE. We describe several use-cases of CARTILAGE and propose to demonstrate the flexibility and efficiency of CARTILAGE through a set of novel scenarios.