Reining in the outliers in map-reduce clusters using Mantri

Authors:
Ganesh Ananthanarayanan;Srikanth Kandula;Albert Greenberg;Ion Stoica;Yi Lu;Bikas Saha;Edward Harris
Affiliations:
Microsoft Research and UC Berkeley;Microsoft Research;Microsoft Research;UC Berkeley;Microsoft Research;Microsoft Bing;Microsoft Bing
Venue:
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Year:
2010

Citing 25
Cited 80

Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems

IEEE Transactions on Software Engineering
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
MPI-FM: high performance MPI on workstation clusters

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Effect of task duplication on the assignment of dependency graphs

Parallel Computing
Walking the tightrope: responsive yet stable traffic engineering

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Task assignment in heterogeneous computing systems

Journal of Parallel and Distributed Computing
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
An analysis of data corruption in the storage stack

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
A scalable, commodity data center network architecture

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
VL2: a scalable and flexible data center network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Distributed aggregation for data-parallel computing: interfaces and implementations

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The nature of data center traffic: measurements & analysis

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Skew-resistant parallel processing of feature-extracting scientific user-defined functions

Proceedings of the 1st ACM symposium on Cloud computing
Making cloud intermediate data fault-tolerant

Proceedings of the 1st ACM symposium on Cloud computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Pipelined broadcast on ethernet switched clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Scarlett: coping with skewed content popularity in mapreduce clusters

Proceedings of the sixth conference on Computer systems
Sharing the data center network

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Disk-locality in datacenter computing considered irrelevant

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Non-deterministic parallelism considered useful

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
More intervention now!

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Applying idealized lower-bound runtime models to understand inefficiencies in data-intensive computing

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Exploring MapReduce efficiency with highly-distributed data

Proceedings of the second international workshop on MapReduce and its applications
ARIA: automatic resource inference and allocation for mapreduce environments

Proceedings of the 8th ACM international conference on Autonomic computing
Applying idealized lower-bound runtime models to understand inefficiencies in data-intensive computing

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Managing data transfers in computer clusters with orchestra

Proceedings of the ACM SIGCOMM 2011 conference
Towards predictable datacenter networks

Proceedings of the ACM SIGCOMM 2011 conference
Disco: a computing platform for large-scale data analytics

Proceedings of the 10th ACM SIGPLAN workshop on Erlang
Purlieus: locality-aware resource allocation for MapReduce in a cloud

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the duality of data-intensive file system design: reconciling HDFS and PVFS

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The price is right: towards location-independent costs in datacenters

Proceedings of the 10th ACM Workshop on Hot Topics in Networks
Mitigating the negative impact of preemption on heterogeneous MapReduce workloads

Proceedings of the 7th International Conference on Network and Services Management
Tarazu: optimizing MapReduce on heterogeneous clusters

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Improving Hadoop performance in intercloud environments

ACM SIGMETRICS Performance Evaluation Review
Jockey: guaranteed job latency in data parallel clusters

Proceedings of the 7th ACM european conference on Computer Systems
CloudSense: continuous fine-grain cloud monitoring with compressive sensing

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Resource provisioning framework for mapreduce jobs with performance goals

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Resource-aware adaptive scheduling for mapreduce clusters

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Synergy2cloud: introducing cross-sharing of application experiences into the cloud management cycle

Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
PACMan: coordinated memory caching for parallel jobs

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Re-optimizing data-parallel computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Understanding the effects and implications of compute node related failures in hadoop

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
The only constant is change: incorporating time-varying network reservations in data centers

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Programming your network at run-time for big data applications

Proceedings of the first workshop on Hot topics in software defined networks
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Why let resources idle? aggressive cloning of jobs with dolly

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Predicting execution bottlenecks in map-reduce clusters

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Server-assisted latency management for wide-area distributed systems

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads

Proceedings of the VLDB Endowment
The only constant is change: incorporating time-varying network reservations in data centers

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Hierarchical merge for scalable MapReduce

Proceedings of the 2012 workshop on Management of big data systems
SCOPE: parallel databases meet MapReduce

The VLDB Journal — The International Journal on Very Large Data Bases
Flat datacenter storage

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Bridging the tenant-provider gap in cloud services

Proceedings of the Third ACM Symposium on Cloud Computing
True elasticity in multi-tenant data-intensive compute clusters

Proceedings of the Third ACM Symposium on Cloud Computing
Designing good algorithms for MapReduce and beyond

Proceedings of the Third ACM Symposium on Cloud Computing
Resource provisioning framework for MapReduce jobs with performance goals

Proceedings of the 12th International Middleware Conference
Resource-aware adaptive scheduling for MapReduce clusters

Proceedings of the 12th International Middleware Conference
Cogset: a high performance MapReduce engine

Concurrency and Computation: Practice & Experience
Theia: visual signatures for problem diagnosis in large hadoop clusters

lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Breaking the MapReduce stage barrier

Cluster Computing
A study of unpredictability in fault-tolerant middleware

Computer Networks: The International Journal of Computer and Telecommunications Networking
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Fast data in the era of big data: Twitter's real-time related query suggestion architecture

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Optimus: a dynamic rewriting framework for data-parallel execution plans

Proceedings of the 8th ACM European Conference on Computer Systems
BlinkDB: queries with bounded errors and bounded response times on very large data

Proceedings of the 8th ACM European Conference on Computer Systems
Presto: distributed machine learning and graph processing with sparse matrices

Proceedings of the 8th ACM European Conference on Computer Systems
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Effective straggler mitigation: attack of the clones

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Split/merge: system support for elastic execution in virtual middleboxes

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
EyeQ: practical network performance isolation at the edge

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Rhea: automatic filtering for unstructured cloud storage

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Leveraging endpoint flexibility in data-intensive clusters

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Speeding up distributed request-response workflows

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
MapReduce with communication overlap (MaRCO)

Journal of Parallel and Distributed Computing
The case for tiny tasks in compute clusters

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Solving the straggler problem with bounded staleness

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Mammoth: autonomic data processing framework for scientific state-transition applications

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Distributed data management using MapReduce

ACM Computing Surveys (CSUR)
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Sparrow: distributed, low latency scheduling

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters

Proceedings of the 4th annual Symposium on Cloud Computing
Limplock: understanding the impact of limpware on scale-out cloud systems

Proceedings of the 4th annual Symposium on Cloud Computing
Joint optimization of overlapping phases in MapReduce

Performance Evaluation
PIKACHU: how to rebalance load in optimizing mapreduce on heterogeneous clusters

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads

Proceedings of the VLDB Endowment
Quasar: resource-efficient and QoS-aware cluster management

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review
MapReduce "garbage" collection

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
SpeQuloS: a QoS service for hybrid and elastic computing infrastructures

Cluster Computing
GRASS: trimming stragglers in approximation analytics

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experience froman operational Map-Reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime. Early action frees up resources that can be used by subsequent tasks and expedites the job overall. Acting based on the causes and the resource and opportunity cost of actions lets Mantri improve over prior work that only duplicates the laggards. Deployment in Bing's production clusters and trace-driven simulations show that Mantri improves job completion times by 32%.