Enhancement of Xen's scheduler for MapReduce workloads

Authors:
Hui Kang;Yao Chen;Jennifer L. Wong;Radu Sion;Jason Wu
Affiliations:
Stony Brook University, Stony Brook, NY, USA;Stony Brook University, Stony Brook, NY, USA;Stony Brook University, Stony Brook, NY, USA;Stony Brook University, Stony Brook, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 20th international symposium on High performance distributed computing
Year:
2011

Citing 22
Cited 9

A generalized processor sharing approach to flow control in integrated services networks: the single-node case

IEEE/ACM Transactions on Networking (TON)
Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler

Proceedings of the seventeenth ACM symposium on Operating systems principles
Feasibility Intervals for the Deadline Driven Scheduler with Arbitrary Deadlines

RTCSA '99 Proceedings of the Sixth International Conference on Real-Time Computing Systems and Applications
Using hierarchical scheduling to support soft real-time applications in general-purpose operating systems

Using hierarchical scheduling to support soft real-time applications in general-purpose operating systems
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Cooperative I/O: a novel I/O semantics for energy-aware applications

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Scale and performance in the Denali isolation kernel

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Measuring CPU overhead for I/O processing in the Xen virtual machine monitor

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Comparison of the three CPU schedulers in Xen

ACM SIGMETRICS Performance Evaluation Review
Scheduling I/O in virtual machine monitors

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Performance Implications of Cache Affinity on Multicore Processors

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Task-aware virtual machine scheduling for I/O performance.

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
SnowFlock: rapid virtual machine cloning for cloud computing

Proceedings of the 4th ACM European conference on Computer systems
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Energy-efficient storage in virtual machine environments

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Supporting soft real-time tasks in the xen hypervisor

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
I/O scheduling model of virtual machine based on multi-core dynamic partitioning

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Locality-aware dynamic VM reconfiguration on MapReduce clouds

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Performance implications of multi-tier application deployments on Infrastructure-as-a-Service clouds: Towards performance modeling

Future Generation Computer Systems
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Streaming as a hypervisor service

Proceedings of the 7th international workshop on Virtualization technologies in distributed computing
Protocol Responsibility Offloading to Improve TCP Throughput in Virtualized Environments

ACM Transactions on Computer Systems (TOCS)
An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
A multicore periodical preemption virtual machine scheduling scheme to improve the performance of computational tasks

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the trends move towards data outsourcing and cloud computing, the efficiency of distributed data centers increases in importance. Cloud-based services such as Amazon's EC2 rely on virtual machines (VMs) to host MapReduce clusters for large data processing. However, current VM scheduling does not provide adequate support for MapReduce workloads, resulting in degraded overall performance. For example, when multiple MapReduce clusters run on a single physical machine, the existing VMMscheduler does not guarantee fairness across clusters. In this work, we present theMapReduce Group Scheduler (MRG). The MRG scheduler implements three mechanisms to improve the efficiency and fairness of the existing VMM scheduler. First, the characteristics of MapReduce workloads facilitate batching of I/O requests from VMs working on the same job, which reduces the number of context switches and brings other benefits. Second, because most MapReduce workloads incur a significant amount of I/O blocking events and the completion of a job depends on the progress of all nodes, we propose a two-level scheduling policy to achieve proportional fair sharing across both MapReduce clusters and individual VMs. Finally, the proposed MRG scheduler also operates on symmetric multi-processor (SMP) enabled platforms. The key to these improvements is to group the scheduling of VMs belonging to the same MapReduce cluster. We have implemented the proposed scheduler by modifying the existing Xen hypervisor and evaluated the performance on Hadoop, an open source implementation of MapReduce. Our evaluations, using four representative MapReduce benchmarks, show that the proposed scheduler reduces context switch overhead and achieves increased proportional fairness across multiple MapReduce clusters, without penalizing the completion time of MapReduce jobs.