Themis: an I/O-efficient MapReduce

  • Authors:
  • Alexander Rasmussen;Vinh The Lam;Michael Conley;George Porter;Rishi Kapoor;Amin Vahdat

  • Affiliations:
  • UC San Diego;UC San Diego;UC San Diego;UC San Diego;UC San Diego;UC San Diego & Google, Inc.

  • Venue:
  • Proceedings of the Third ACM Symposium on Cloud Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

"Big Data" computing increasingly utilizes the MapReduce programming model for scalable processing of large data collections. Many MapReduce jobs are I/O-bound, and so minimizing the number of I/O operations is critical to improving their performance. In this work, we present Themis, a MapReduce implementation that reads and writes data records to disk exactly twice, which is the minimum amount possible for data sets that cannot fit in memory. In order to minimize I/O, Themis makes fundamentally different design decisions from previous MapReduce implementations. Themis performs a wide variety of MapReduce jobs -- including click log analysis, DNA read sequence alignment, and PageRank -- at nearly the speed of TritonSort's record-setting sort performance [29].