Joint optimization of overlapping phases in MapReduce

Authors:
Minghong Lin;Li Zhang;Adam Wierman;Jian Tan
Affiliations:
Computer Science, California Institute of Technology;IBM T.J. Watson Research Center;Computer Science, California Institute of Technology;IBM T.J. Watson Research Center
Venue:
ACM SIGMETRICS Performance Evaluation Review
Year:
2014

Citing 4
Cited 0

Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
An Analysis of Traces from a Production MapReduce Cluster

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. This work studies the scheduling challenge that results from the overlapping of the "map" and "shuffle" phases in MapReduce. We propose a new, general model for this scheduling problem. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive in the online case. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate.