Packing the most onto your cloud

Authors:
Ashraf Aboulnaga;Ziyu Wang;Zi Ye Zhang
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the first international workshop on Cloud data management
Year:
2009

Citing 3
Cited 5

Pairwise document similarity in large collections with MapReduce

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Tuning database configuration parameters with iTuned

Proceedings of the VLDB Endowment
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Workload-aware database monitoring and consolidation

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Abusing cloud-based browsers for fun and profit

Proceedings of the 28th Annual Computer Security Applications Conference
HAT: history-based auto-tuning MapReduce in heterogeneous environments

The Journal of Supercomputing
A MapReduce task scheduling algorithm for deadline constraints

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.