Packing the most onto your cloud

  • Authors:
  • Ashraf Aboulnaga;Ziyu Wang;Zi Ye Zhang

  • Affiliations:
  • University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada

  • Venue:
  • Proceedings of the first international workshop on Cloud data management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.