MapReduce with communication overlap (MaRCO)

Authors:
Faraz Ahmad;Seyong Lee;Mithuna Thottethodi;T. N. Vijaykumar
Affiliations:
Purdue University, 465 Northwestern Ave., West Lafayette, IN 47906, United States;Oak Ridge National Labs, Oak Ridge, TN 37831, United States;Purdue University, 465 Northwestern Ave., West Lafayette, IN 47906, United States;Purdue University, 465 Northwestern Ave., West Lafayette, IN 47906, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 22
Cited 0

Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
High-performance sorting on networks of workstations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Clustering Algorithms

Clustering Algorithms
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Breaking the MapReduce Stage Barrier

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Tarazu: optimizing MapReduce on heterogeneous clusters

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a programming model from Google for cluster-based computing in domains such as search engines, machine learning, and data mining. MapReduce provides automatic data management and fault tolerance to improve programmability of clusters. MapReduce's execution model includes an all-map-to-all-reduce communication, called the shuffle, across the network bisection. Some MapReductions move large amounts of data (e.g., as much as the input data), stressing the bisection bandwidth and introducing significant runtime overhead. Optimizing such shuffle-heavy MapReductions is important because (1) they include key applications (e.g., inverted indexing for search engines and data clustering for machine learning) and (2) they run longer than shuffle-light MapReductions (e.g., 5x longer). In MapReduce, the asynchronous nature of the shuffle results in some overlap between the shuffle and map. Unfortunately, this overlap is insufficient in shuffle-heavy MapReductions. We propose MapReduce with communication overlap (MaRCO) to achieve nearly full overlap via the novel idea of including reduce in the overlap. While MapReduce lazily performs reduce computation only after receiving all the map data, MaRCO employs eager reduce to process partial data from some map tasks while overlapping with other map tasks' communication. MaRCO's approach of hiding the latency of the inevitably high shuffle volume of shuffle-heavy MapReductions is fundamental for achieving performance. We implement MaRCO in Hadoop's MapReduce and show that on a 128-node Amazon EC2 cluster, MaRCO achieves 23% average speed-up over Hadoop for shuffle-heavy MapReductions.