Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization

Authors:
Zhenhua Guo;Geoffrey Fox
Affiliations:
-;-
Venue:
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Year:
2012

Citing 9
Cited 0

The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Dynamic mapping of a class of independent tasks onto heterogeneous computing systems

Journal of Parallel and Distributed Computing - Special issue on software support for distributed computing
A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
Cloud technologies for bioinformatics applications

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A virtual network (ViNe) architecture for grid computing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
A hierarchical framework for cross-domain MapReduce execution

Proceedings of the second international workshop on Emerging computational methods for the life sciences
Automatic Task Re-organization in MapReduce

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a widely-used model for data parallel applications. We found its resource utilization is inefficient when there are not enough tasks to fill all task slots as the resources "reserved" for idle slots are just wasted. We propose resource stealing which enables running tasks to steal the unutilized resources and return them when new tasks are assigned. It exploits the opportunistic use of the otherwise wasted resources to improve overall resource utilization and reduce job execution time. Besides, our practical use of Hadoop shows the current mechanism adopted to trigger speculative execution creates many unnecessary speculative tasks that are killed soon after creation as the original tasks complete earlier. To alleviate the issue, we propose Benefit Aware Speculative Execution which predicts the benefit of running new speculative tasks and greatly eliminates unnecessary runs. Finally, MapReduce is mainly optimized for homogeneous environments and its inefficiency in heterogeneous network environments has been observed in our experiments. We investigate network heterogeneity aware scheduling of both map and reduce tasks. Overall, our goal is to enhance Hadoop to cope with significant network heterogeneity and improve resource utilization.