TransMR: data-centric programming beyond data parallelism

  • Authors:
  • Naresh Rapolu;Karthik Kambatla;Suresh Jagannathan;Ananth Grama

  • Affiliations:
  • Dept. of Computer Science, Purdue University;Dept. of Computer Science, Purdue University;Dept. of Computer Science, Purdue University;Dept. of Computer Science, Purdue University

  • Venue:
  • HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce and related data-centric programming models have proven to be effective for a variety of large-scale distributed computations, in particular, those that manifest data parallelism. The fault-tolerance model underlying these programming environments relies on deterministic replay, which makes data-sharing (side-effects) across computations harder to support. This significantly limits the application scope of MapReduce and related models. This paper: (i) investigates data sharing (side-effects) in programming models operating on distributed key-value stores, specifically, the inconsistencies between the fault recovery mechanisms in execution and storage layers; (ii) defines semantics for a novel programming model, TransMR (Transactional MapReduce), which addresses these inconsistencies; and (iii) demonstrates broad application scope and enhanced performance through data-sharing across computations for a prototype implementation of the proposed semantics.