Evaluating MapReduce on Virtual Machines: The Hadoop Case

Authors:
Shadi Ibrahim;Hai Jin;Lu Lu;Li Qi;Song Wu;Xuanhua Shi
Affiliations:
Cluster and Grid Computing Lab Services Computing Technology and System Lab, Huazhong University of Science & Technology, Wuhan, China 430074;Cluster and Grid Computing Lab Services Computing Technology and System Lab, Huazhong University of Science & Technology, Wuhan, China 430074;Cluster and Grid Computing Lab Services Computing Technology and System Lab, Huazhong University of Science & Technology, Wuhan, China 430074;Operation Center, China Development Bank, Beijing, China;Cluster and Grid Computing Lab Services Computing Technology and System Lab, Huazhong University of Science & Technology, Wuhan, China 430074;Cluster and Grid Computing Lab Services Computing Technology and System Lab, Huazhong University of Science & Technology, Wuhan, China 430074
Venue:
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Year:
2009

Citing 11
Cited 6

A Case For Grid Computing On Virtual Machines

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Virtualization for high-performance computing

ACM SIGOPS Operating Systems Review
A case for high performance computing with virtual machines

Proceedings of the 20th annual international conference on Supercomputing
Live migration of virtual machines

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Proactive fault tolerance for HPC with Xen virtualization

Proceedings of the 21st annual international conference on Supercomputing
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Experimental study of virtual machine migration in support of reservation of cluster resources

VTDC '07 Proceedings of the 2nd international workshop on Virtualization technology in distributed computing
CLOUDLET: towards mapreduce implementation on virtual machines

Proceedings of the 18th ACM international symposium on High performance distributed computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

MR-scope: a real-time tracing tool for MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Performance evaluation of OpenMP applications on virtualized multicore machines

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Economic theory for memory management optimization

Proceedings of the 6th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems
Performance evaluation of MapReduce using full virtualisation on a departmental cloud

International Journal of Applied Mathematics and Computer Science - SPECIAL SECTION: Efficient Resource Management for Grid-Enabled Applications
VC-Migration: Live Migration of Virtual Clusters in the Cloud

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduceis emerging as an important programming model for large scale parallel application. Meanwhile, Hadoop is an open source implementation of MapReduce enjoying wide popularity for developing data intensive applications in the cloud. As, in the cloud, the computing unit is virtual machine (VM) based; it is feasible to demonstrate the applicability of MapReduce on virtualized data center. Although the potential for poor performance and heavy load no doubt exists, virtual machines can instead be used to fully utilize the system resources, ease the management of such systems, improve the reliability, and save the power. In this paper, a series of experiments are conducted to measure and analyze the performance of Hadoop on VMs. Our experiments are used as a basis for outlining several issues that will need to be considered when implementing MapReduce to fit completely in the cloud.