Interference and locality-aware task scheduling for MapReduce applications in virtual clusters

Authors:
Xiangping Bu;Jia Rao;Cheng-zhong Xu
Affiliations:
Wayne State University, Detroit, MI, USA;University of Colorado at Colorado Springs, Colorado Springs, CO, USA;Wayne State University, Detroit, MI, USA
Venue:
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Year:
2013

Citing 25
Cited 1

Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Predicting Running Time of Grid Tasks based on CPU Load Predictions

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
VCONF: a reinforcement learning approach to virtual machines auto-configuration

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
A Reinforcement Learning Approach to Online Web Systems Auto-configuration

ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scarlett: coping with skewed content popularity in mapreduce clusters

Proceedings of the sixth conference on Computer systems
Enhancement of Xen's scheduler for MapReduce workloads

Proceedings of the 20th international symposium on High performance distributed computing
A Model-free Learning Approach for Coordinated Configuration of Virtual Machines and Appliances

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
A Distributed Self-Learning Approach for Elastic Provisioning of Virtualized Cloud Resources

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
TRACON: interference-aware scheduling for data-intensive applications in virtualized environments

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Purlieus: locality-aware resource allocation for MapReduce in a cloud

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Locality-aware dynamic VM reconfiguration on MapReduce clouds

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
NINEPIN: Non-invasive and energy efficient performance isolation in virtualized servers

DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems
AROMA: automated resource allocation and configuration of mapreduce environment in the cloud

Proceedings of the 9th international conference on Autonomic computing

CooMR: cross-task coordination for efficient data management in MapReduce programs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce emerges as an important distributed programming paradigm for large-scale applications. Running MapReduce applications in clouds presents an attractive usage model for enterprises. In a virtual MapReduce cluster, the interference between virtual machines (VMs) causes performance degradation of map and reduce tasks and renders existing data locality-aware task scheduling policy, like delay scheduling, no longer effective. On the other hand, virtualization offers an extra opportunity of data locality for co-hosted VMs. In this paper, we present a task scheduling strategy to mitigate interference and meanwhile preserving task data locality for MapReduce applications. The strategy includes an interference-aware scheduling policy, based on a task performance prediction model, and an adaptive delay scheduling algorithm for data locality improvement. We implement the interference and locality-aware (ILA) scheduling strategy in a virtual MapReduce framework. We evaluated its effectiveness and efficiency on a 72-node Xen-based virtual cluster. Experimental results with 10 representative CPU and IO-intensive applications show that ILA is able to achieve a speedup of 1.5 to 6.5 times for individual jobs and yield an improvement of up to 1.9 times in system throughput in comparison with four other MapReduce schedulers.