Location-Aware MapReduce in Virtual Cloud

Authors:
Yifeng Geng;Shimin Chen;YongWei Wu;Ryan Wu;Guangwen Yang;Weimin Zheng
Affiliations:
-;-;-;-;-;-
Venue:
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Year:
2011

Citing 0
Cited 4

AROMA: automated resource allocation and configuration of mapreduce environment in the cloud

Proceedings of the 9th international conference on Autonomic computing
Failure scenario as a service (FSaaS) for Hadoop clusters

Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management
Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is an important programming model for processing and generating large data sets in parallel. It is commonly applied in applications such as web indexing, data mining, machine learning, etc. As an open-source implementation of MapReduce, Hadoop is now widely used in industry. Virtualization, which is easy to configure and economical to use, shows great potential for cloud computing. With the increasing core number in a CPU and involving of virtualization technique, one physical machine can hosts more and more virtual machines, but I/O devices normally do not increase so rapidly. As MapReduce system is often used to running I/O intensive applications, decreasing of data redundancy and load unbalance, which increase I/O interference in virtual cloud, come to be serious problems. This paper builds a model and defines metrics to analyze the data allocation problem in virtual environment theoretically. And we design a location-aware file block allocation strategy that retains compatibility with the native Hadoop. Our model simulation and experiment in real system shows our new strategy can achieve better data redundancy and load balance to reduce I/O interference. Execution time of applications such as RandomWriter, Text Sort and Word Count are reduced by up to 33% and 10% on average.