HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers

  • Authors:
  • Bikash Sharma;Timothy Wood;Chita R. Das

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDCS '13 Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first phase, HybridMR classifies incoming MapReduce jobs based on the expected virtualization overheads, and uses this information to automatically guide placement between physical and virtual machines. In the second phase, HybridMR manages the run-time performance of MapReduce jobs collocated with interactive applications in order to provide best effort delivery to batch jobs, while complying with the Service Level Agreements (SLAs) of interactive applications. By consolidating batch jobs with over-provisioned foreground applications, the available unused resources are better utilized, resulting in improved application performance and energy efficiency. Evaluations on a hybrid cluster consisting of 24 physical servers and 48 virtual machines, with diverse workload mix of interactive and batch MapReduce applications, demonstrate that HybridMR can achieve up to 40% improvement in the completion times of MapReduce jobs, over the virtual-only case, while complying with the SLAs of interactive applications. Compared to the native-only cluster, at the cost of minimal performance penalty, HybridMR boosts resource utilization by 45%, and achieves up to 43% energy savings. These results indicate that a hybrid data center with an efficient scheduling mechanism can provide a cost-effective solution for hosting both batch and interactive workloads.