FLEX: a slot allocation scheduling optimizer for MapReduce workloads

Authors:
Joel Wolf;Deepak Rajan;Kirsten Hildrum;Rohit Khandekar;Vibhore Kumar;Sujay Parekh;Kun-Lung Wu;Andrey balmin
Affiliations:
IBM Watson Research Center, Hawthorne, NY;IBM Watson Research Center, Hawthorne, NY;IBM Watson Research Center, Hawthorne, NY;IBM Watson Research Center, Hawthorne, NY;IBM Watson Research Center, Hawthorne, NY;IBM Watson Research Center, Hawthorne, NY;IBM Watson Research Center, Hawthorne, NY;IBM Almaden Research Center, San Jose CA
Venue:
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Year:
2010

Citing 13
Cited 18

Resource allocation problems: algorithmic approaches

Resource allocation problems: algorithmic approaches
Approximate algorithms scheduling parallelizable tasks

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Smart SMART Bounds for Weighted Response Time Scheduling

SIAM Journal on Computing
A Fast Selection Algorithm and the Problem of Optimum Distribution of Effort

Journal of the ACM (JACM)
Scheduling in Computer and Manufacturing Systems

Scheduling in Computer and Manufacturing Systems
Generalized selection and ranking (Preliminary Version)

STOC '80 Proceedings of the twelfth annual ACM symposium on Theory of computing
Handbook of Scheduling: Algorithms, Models, and Performance Analysis

Handbook of Scheduling: Algorithms, Models, and Performance Analysis
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Scheduling shared scans of large data files

Proceedings of the VLDB Endowment
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

On scheduling in map-reduce and flow-shops

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
ARIA: automatic resource inference and allocation for mapreduce environments

Proceedings of the 8th ACM international conference on Autonomic computing
CIRCUMFLEX: a scheduling optimizer for MapReduce workloads with shared scans

ACM SIGOPS Operating Systems Review
Meeting service level objectives of Pig programs

Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Resource provisioning framework for mapreduce jobs with performance goals

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Resource-aware adaptive scheduling for mapreduce clusters

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Adaptive MapReduce using situation-aware mappers

Proceedings of the 15th International Conference on Extending Database Technology
Delay tails in MapReduce scheduling

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Optimizing Completion Time and Resource Provisioning of Pig Programs

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Automated profiling and resource management of pig programs for meeting service level objectives

Proceedings of the 9th international conference on Autonomic computing
On the optimization of schedules for MapReduce workloads in the presence of shared scans

The VLDB Journal — The International Journal on Very Large Data Bases
Resource provisioning framework for MapReduce jobs with performance goals

Proceedings of the 12th International Middleware Conference
Resource-aware adaptive scheduling for MapReduce clusters

Proceedings of the 12th International Middleware Conference
Performance Modeling and Optimization of Deadline-Driven Pig Programs

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Optimization strategies for A/B testing on HADOOP

Proceedings of the VLDB Endowment
PREDIcT: towards predicting the runtime of large scale iterative analytics

Proceedings of the VLDB Endowment
Run-time performance optimization of a BigData query language

Proceedings of the 5th ACM/SPEC international conference on Performance engineering
A platform for eXtreme analytics

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Originally, MapReduce implementations such as Hadoop employed First In First Out (fifo) scheduling, but such simple schemes cause job starvation. The Hadoop Fair Scheduler (hfs) is a slot-based MapReduce scheme designed to ensure a degree of fairness among the jobs, by guaranteeing each job at least some minimum number of allocated slots. Our prime contribution in this paper is a different, flexible scheduling allocation scheme, known as flex. Our goal is to optimize any of a variety of standard scheduling theory metrics (response time, stretch, makespan and Service Level Agreements (slas), among others) while ensuring the same minimum job slot guarantees as in hfs, and maximum job slot guarantees as well. The flex allocation scheduler can be regarded as an add-on module that works synergistically with hfs. We describe the mathematical basis for flex, and compare it with fifo and hfs in a variety of experiments.