Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

  • Authors:
  • Bogdan Ghit;Nezih Yigitbasi;Dick Epema

  • Affiliations:
  • -;-;-

  • Venue:
  • SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

State-of-the-art MapReduce frameworks such as Hadoop can easily scale up to thousands of machines and to large numbers of users. Nevertheless, some users may require isolated environments to develop their applications and to process their data, which calls for multiple deployments of MR clusters within the same physical infrastructure. In this paper, we design and implement a resource management system to facilitate the on-demand isolated deployment of MapReduce clusters in multicluster systems. Deploying multiple MapReduce clusters enables four types of isolation, with respect to performance, to data management, to fault tolerance, and to versioning. To efficiently manage the underlying physical resources, we propose three provisioning policies for dynamically resizing MapReduce clusters, and we evaluate the performance of our system through experiments on a real multicluster.