Asynchronous Algorithms in MapReduce

  • Authors:
  • Karthik Kambatla;Naresh Rapolu;Suresh Jagannathan;Ananth Grama

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Asynchronous algorithms have been demonstrated to improve scalability of a variety of applications in parallel environments. Their distributed adaptations have received relatively less attention, particularly in the context of conventional execution environments and associated overheads. One such framework, MapReduce, has emerged as a commonly used programming framework for large-scale distributed environments. While the MapReduce programming model has proved to be effective for data-parallel applications, significant questions relating to its performance and application scope remain unresolved. The strict synchronization between map and reduce phases limits expression of asynchrony and hence, does not readily support asynchronous algorithms. This paper investigates the notion of partial synchronizations in iterative MapReduce applications to overcome global synchronization overheads. The proposed approach applies a locality-enhancing partition on the computation. Map tasks execute local computations with (relatively) frequent local synchronizations, with less frequent global synchronizations. This approach yields significant performance gains in distributed environments, even though their serial operation counts are higher. We demonstrate these performance gains on asynchronous algorithms for diverse applications, including pagerank, shortestpath, and kmeans. We make the following specific contributions in the paper(i) we motivate the need to extend MapReduce with constructs for asynchrony, (ii) we propose an API to facilitate partial synchronizations combined with eager scheduling and locality enhancing techniques, and (iii) demonstrate performance improvements from our proposed extensions through a variety of applications from different domains.