Enhancement of Xen's scheduler for MapReduce workloads

  • Authors:
  • Hui Kang;Yao Chen;Jennifer L. Wong;Radu Sion;Jason Wu

  • Affiliations:
  • Stony Brook University, Stony Brook, NY, USA;Stony Brook University, Stony Brook, NY, USA;Stony Brook University, Stony Brook, NY, USA;Stony Brook University, Stony Brook, NY, USA;Cornell University, Ithaca, NY, USA

  • Venue:
  • Proceedings of the 20th international symposium on High performance distributed computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the trends move towards data outsourcing and cloud computing, the efficiency of distributed data centers increases in importance. Cloud-based services such as Amazon's EC2 rely on virtual machines (VMs) to host MapReduce clusters for large data processing. However, current VM scheduling does not provide adequate support for MapReduce workloads, resulting in degraded overall performance. For example, when multiple MapReduce clusters run on a single physical machine, the existing VMMscheduler does not guarantee fairness across clusters. In this work, we present theMapReduce Group Scheduler (MRG). The MRG scheduler implements three mechanisms to improve the efficiency and fairness of the existing VMM scheduler. First, the characteristics of MapReduce workloads facilitate batching of I/O requests from VMs working on the same job, which reduces the number of context switches and brings other benefits. Second, because most MapReduce workloads incur a significant amount of I/O blocking events and the completion of a job depends on the progress of all nodes, we propose a two-level scheduling policy to achieve proportional fair sharing across both MapReduce clusters and individual VMs. Finally, the proposed MRG scheduler also operates on symmetric multi-processor (SMP) enabled platforms. The key to these improvements is to group the scheduling of VMs belonging to the same MapReduce cluster. We have implemented the proposed scheduler by modifying the existing Xen hypervisor and evaluated the performance on Hadoop, an open source implementation of MapReduce. Our evaluations, using four representative MapReduce benchmarks, show that the proposed scheduler reduces context switch overhead and achieves increased proportional fairness across multiple MapReduce clusters, without penalizing the completion time of MapReduce jobs.