Breaking the MapReduce stage barrier

Authors:
Abhishek Verma;Brian Cho;Nicolas Zea;Indranil Gupta;Roy H. Campbell
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801-2302;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801-2302;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801-2302;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801-2302;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801-2302
Venue:
Cluster Computing
Year:
2013

Citing 16
Cited 0

Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Berkeley DB

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Multiprocessor hash-based join algorithms

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
A dichromatic framework for balanced trees

SFCS '78 Proceedings of the 19th Annual Symposium on Foundations of Computer Science
Pairwise document similarity in large collections with MapReduce

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Fast, easy, and cheap: construction of statistical machine translation models with MapReduce

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
Scaling Genetic Algorithms Using MapReduce

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
DryadInc: reusing work in large-scale computations

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Breaking the MapReduce Stage Barrier

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrier-less MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better job completion times than a traditional MapReduce framework. This is due primarily to the interleaving of I/O and computation, and forgoing disk-intensive work. We achieve a reduction in job completion times that is 25% on average and 87% in the best case.