Future Generation Computer Systems - Special issue on metacomputing
Dynamic mapping of a class of independent tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing - Special issue on software support for distributed computing
Journal of Parallel and Distributed Computing
Cloud technologies for bioinformatics applications
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A virtual network (ViNe) architecture for grid computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
A hierarchical framework for cross-domain MapReduce execution
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Automatic Task Re-organization in MapReduce
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Hi-index | 0.00 |
MapReduce is a widely-used model for data parallel applications. We found its resource utilization is inefficient when there are not enough tasks to fill all task slots as the resources "reserved" for idle slots are just wasted. We propose resource stealing which enables running tasks to steal the unutilized resources and return them when new tasks are assigned. It exploits the opportunistic use of the otherwise wasted resources to improve overall resource utilization and reduce job execution time. Besides, our practical use of Hadoop shows the current mechanism adopted to trigger speculative execution creates many unnecessary speculative tasks that are killed soon after creation as the original tasks complete earlier. To alleviate the issue, we propose Benefit Aware Speculative Execution which predicts the benefit of running new speculative tasks and greatly eliminates unnecessary runs. Finally, MapReduce is mainly optimized for homogeneous environments and its inefficiency in heterogeneous network environments has been observed in our experiments. We investigate network heterogeneity aware scheduling of both map and reduce tasks. Overall, our goal is to enhance Hadoop to cope with significant network heterogeneity and improve resource utilization.