Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce

  • Authors:
  • Kento Emoto;Sebastian Fischer;Zhenjiang Hu

  • Affiliations:
  • The University of Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan

  • Venue:
  • ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensively studied in the functional programming and program calculation fields. We show that MapReduce can be equipped with a programming theory in calculational form. By integrating the generate-and-test programing paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way. We give nontrivial examples that demonstrate how to apply our framework.