Network-aware scheduling of mapreduce framework ondistributed clusters over high speed networks

Authors:
Praveenkumar Kondikoppa;Chui-Hui Chiu;Cheng Cui;Lin Xue;Seung-Jong Park
Affiliations:
Louisiana State University, Baton Rouge, USA;Louisiana State University, Baton Rouge, USA;Louisiana State University, Baton Rouge, USA;Louisiana State University, Baton Rouge, USA;Louisiana State University, Baton Rouge, USA
Venue:
Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit
Year:
2012

Citing 4
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
A hierarchical framework for cross-domain MapReduce execution

Proceedings of the second international workshop on Emerging computational methods for the life sciences
Exploring MapReduce efficiency with highly-distributed data

Proceedings of the second international workshop on MapReduce and its applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Google's MapReduce has gained significant popularity as a platform for large scale distributed data processing. Hadoop [1] is an open source implementation of MapReduce [11] framework, originally it was developed to operate over single cluster environment and could not be leveraged for distributed data processing across federated clusters. At multiple federated clusters connected with high speed networks, computing resources are provisioned from any of the clusters from the federation. Placement of map tasks close to its data split is critical for performance of Hadoop. In this work, we add network awareness in Hadoop while scheduling the map tasks over federated clusters. We observe 12% to 15 % reduction of execution time in FIFO and FAIR schedulers of Hadoop for varying workloads.