Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Tuning database configuration parameters with iTuned
Proceedings of the VLDB Endowment
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Workload-aware database monitoring and consolidation
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Abusing cloud-based browsers for fun and profit
Proceedings of the 28th Annual Computer Security Applications Conference
HAT: history-based auto-tuning MapReduce in heterogeneous environments
The Journal of Supercomputing
A MapReduce task scheduling algorithm for deadline constraints
Cluster Computing
Hi-index | 0.00 |
Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefits up to 30%.