Decoupling storage and computation in Hadoop with SuperDataNodes

  • Authors:
  • George Porter

  • Affiliations:
  • UC San Diego, La Jolla, CA

  • Venue:
  • ACM SIGOPS Operating Systems Review
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rise of ad-hoc data-intensive computing has led to the development of data-parallel programming systems such as Map/Reduce and Hadoop, which achieve scalability by tightly coupling storage and computation. This can be limiting when the ratio of computation to storage is not known in advance, or changes over time. In this work, we examine decoupling storage and computation in Hadoop through SuperDataNodes, which are servers that contain an order of magnitude more disks than traditional Hadoop nodes. We found that SuperDataNodes are not only capable of supporting workloads with high storage-to-processing workloads, but in some cases can outperform traditional Hadoop deployments through better management of a large centralized pool of disks.