A Dynamic MapReduce Scheduler for Heterogeneous Workloads

  • Authors:
  • Chao Tian;Haojie Zhou;Yongqiang He;Li Zha

  • Affiliations:
  • -;-;-;-

  • Venue:
  • GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/O-bound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has not been concerned. In this paper, we give a new view of the MapReduce model, and classify the MapReduce workloads into three categories based on their CPU and I/O utilization. With workload classification, we design a new dynamic MapReduce workload predict mechanism, MR-Predict, which detects the workload type on the fly. We propose a Triple-Queue Scheduler based on the MR-Predict mechanism. The Triple-Queue scheduler could improve the usage of both CPU and disk I/O resources under heterogeneous workloads. And it could improve the Hadoop throughput by about 30% under heterogeneous workloads.