Quincy: fair scheduling for distributed computing clusters

Authors:
Michael Isard;Vijayan Prabhakaran;Jon Currey;Udi Wieder;Kunal Talwar;Andrew Goldberg
Affiliations:
Microsoft Corporation, Mountain View, CA, USA;Microsoft Corporation, Mountain View, CA, USA;Microsoft Corporation, Mountain View, CA, USA;Microsoft Corporation, Mountain View, CA, USA;Microsoft Corporation, Mountain View, CA, USA;Microsoft Corporation, Mountain View, CA, USA
Venue:
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Year:
2009

Citing 25
Cited 80

A fair share scheduler

Communications of the ACM
Finding minimum-cost circulations by successive approximation

Mathematics of Operations Research
A faster strongly polynomial minimum cost flow algorithm

Operations Research
Models of machines and computation for mapping in multicomputers

ACM Computing Surveys (CSUR)
Performance analysis of job scheduling policies in parallel supercomputing environments

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
An efficient implementation of a scaling minimum-cost flow algorithm

Journal of Algorithms
A scheduling philosophy for multi-processing systems

SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
Matchmaking: Distributed Resource Management for High Throughput Computing

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Policy Driven Heterogeneous Resource Co-Allocation with Gangmatching

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dynamic function placement for data-intensive cluster computing

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Multiprocessor Scheduling with the Aid of Network Flow Algorithms

IEEE Transactions on Software Engineering
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Processor hardware counter statistics as a first-class system resource

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Scheduling shared scans of large data files

Proceedings of the VLDB Endowment
PARDA: proportional allocation of resources for distributed storage access

FAST '09 Proccedings of the 7th conference on File and storage technologies
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Robust and flexible power-proportional storage

Proceedings of the 1st ACM symposium on Cloud computing
Predictable time-sharing for DryadLINQ cluster

Proceedings of the 7th international conference on Autonomic computing
An Analysis of Traces from a Production MapReduce Cluster

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Topology-aware resource allocation for data-intensive workloads

Proceedings of the first ACM asia-pacific workshop on Workshop on systems
PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Manimal: relational optimization for data-intensive programs

Procceedings of the 13th International Workshop on the Web and Databases
Conductor: orchestrating the clouds

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Scripting the cloud with skywriting

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Topology-aware resource allocation for data-intensive workloads

ACM SIGCOMM Computer Communication Review
Scarlett: coping with skewed content popularity in mapreduce clusters

Proceedings of the sixth conference on Computer systems
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Mesos: a platform for fine-grained resource sharing in the data center

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Sharing the data center network

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Dominant resource fairness: fair allocation of multiple resource types

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Automatic optimization for MapReduce programs

Proceedings of the VLDB Endowment
RAFT at work: speeding-up mapreduce applications under task and node failures

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
On scheduling in map-reduce and flow-shops

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Disk-locality in datacenter computing considered irrelevant

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Non-deterministic parallelism considered useful

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
More intervention now!

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Parallelizing large-scale data processing applications with data skew: a case study in product-offer matching

Proceedings of the second international workshop on MapReduce and its applications
ARIA: automatic resource inference and allocation for mapreduce environments

Proceedings of the 8th ACM international conference on Autonomic computing
TidyFS: a simple and small distributed file system

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Augmenting data center networks with multi-gigabit wireless links

Proceedings of the ACM SIGCOMM 2011 conference
Managing data transfers in computer clusters with orchestra

Proceedings of the ACM SIGCOMM 2011 conference
FLEX: a slot allocation scheduling optimizer for MapReduce workloads

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Purlieus: locality-aware resource allocation for MapReduce in a cloud

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
QoS and preemption aware scheduling in federated and virtualized Grid computing environments

Journal of Parallel and Distributed Computing
Mitigating the negative impact of preemption on heterogeneous MapReduce workloads

Proceedings of the 7th International Conference on Network and Services Management
Tarazu: optimizing MapReduce on heterogeneous clusters

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis

Proceedings of the 7th ACM european conference on Computer Systems
Jockey: guaranteed job latency in data parallel clusters

Proceedings of the 7th ACM european conference on Computer Systems
The datacenter needs an operating system

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Resource provisioning framework for mapreduce jobs with performance goals

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Resource-aware adaptive scheduling for mapreduce clusters

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Re-optimizing data-parallel computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Orchestrating the deployment of computations in the cloud with conductor

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Delay tails in MapReduce scheduling

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Investigation of data locality and fairness in MapReduce

Proceedings of third international workshop on MapReduce and its Applications Date
Putting a "big-data" platform to good use: training kinect

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Locality-aware dynamic VM reconfiguration on MapReduce clouds

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
MapReduce Workload Modeling with Statistical Approach

Journal of Grid Computing
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Why let resources idle? aggressive cloning of jobs with dolly

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
On scheduling dag s for volatile computing platforms: Area-maximizing schedules

Journal of Parallel and Distributed Computing
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads

Proceedings of the VLDB Endowment
SCOPE: parallel databases meet MapReduce

The VLDB Journal — The International Journal on Very Large Data Bases
Improving large graph processing on partitioned graphs in the cloud

Proceedings of the Third ACM Symposium on Cloud Computing
Bridging the tenant-provider gap in cloud services

Proceedings of the Third ACM Symposium on Cloud Computing
True elasticity in multi-tenant data-intensive compute clusters

Proceedings of the Third ACM Symposium on Cloud Computing
Scalable distributed architecture for media transcoding

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Failure scenario as a service (FSaaS) for Hadoop clusters

Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management
Data-Intensive Workload Consolidation for the Hadoop Distributed File System

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Resource provisioning framework for MapReduce jobs with performance goals

Proceedings of the 12th International Middleware Conference
Resource-aware adaptive scheduling for MapReduce clusters

Proceedings of the 12th International Middleware Conference
ClouDiA: a deployment advisor for public clouds

Proceedings of the VLDB Endowment
Resource provisioning based on lease preemption in InterGrid

ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Shark: SQL and rich analytics at scale

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Omega: flexible, scalable schedulers for large compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Choosy: max-min fair sharing for datacenter jobs with constraints

Proceedings of the 8th ACM European Conference on Computer Systems
Workload management for big data analytics

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A throughput optimal algorithm for map task scheduling in mapreduce with data locality

ACM SIGMETRICS Performance Evaluation Review
Rhea: automatic filtering for unstructured cloud storage

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Leveraging endpoint flexibility in data-intensive clusters

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Efficient online scheduling for deadline-sensitive jobs: extended abstract

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
The case for tiny tasks in compute clusters

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Mammoth: autonomic data processing framework for scientific state-transition applications

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Sparrow: distributed, low latency scheduling

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters

Proceedings of the 4th annual Symposium on Cloud Computing
Hierarchical scheduling for diverse datacenter workloads

Proceedings of the 4th annual Symposium on Cloud Computing
Joint optimization of overlapping phases in MapReduce

Performance Evaluation
Piranha: optimizing short jobs in Hadoop

Proceedings of the VLDB Endowment
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Agile middleware for scheduling: meeting competing performance requirements of diverse tasks

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model. We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.