Inside the Java Virtual Machine
Inside the Java Virtual Machine
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Hadoop: The Definitive Guide
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
RAFT at work: speeding-up mapreduce applications under task and node failures
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hi-index | 0.00 |
Recently, Hadoop, an open source implementation of MapReduce, has become very popular due to its characteristics such as simple programming syntax, and its support for distributed computing and fault tolerance. Although Hadoop is able to automatically reschedule failed tasks, it is powerless to deal with tasks with poor performance. Managing such tasks is vital as they lower the whole job's performance. Thus in this work, we design a novel garbage collection technique that identifies and collects "garbage" tasks. Three research questions are addressed in this work. The first, does collecting (shutting down) garbage (slow) tasks help in reducing the total job completion time and resources cost? The second, when is it most efficient to invoke the Garbage Collector? The third, how to identify garbage (slow) tasks and what are the major factors causing a task to slow down?. The proposed Garbage Collector is evaluated on Amazon EC2 using two metrics: (i) the time for a single job completion, and (ii) resource costs. The empirical results using the TeraSort benchmark show that collecting garbage tasks does reduce the job completion time by 16% and resources cost by 27%. The results also show that the Garbage Collector needs to be invoked before the job is 40% completed, otherwise it would be better to leave the slow tasks till the end of the job because at this point the cost of re-executing these slow tasks becomes high. Finally, our results show that CPU utilization is a good indicator of slow tasks.