An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters

Authors:
Kwonyong Lee;Yoonsung Nam;Taekhee Kim;Sungyong Park
Affiliations:
Sogang University, Mapo-Gu, Seoul, Korea;Sogang University, Mapo-Gu, Seoul, Korea;Sogang University, Mapo-Gu, Seoul, Korea;Sogang University, Mapo-Gu, Seoul, Korea
Venue:
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Year:
2013

Citing 12
Cited 0

Design and Testing of a Generalized Reduced Gradient Code for Nonlinear Programming

ACM Transactions on Mathematical Software (TOMS)
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Diagnosing performance overheads in the xen virtual machine environment

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Using Regression Techniques to Predict Large Data Transfers

International Journal of High Performance Computing Applications
Measuring CPU overhead for I/O processing in the Xen virtual machine monitor

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Evaluating MapReduce on Virtual Machines: The Hadoop Case

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Evaluating I/O Scheduler in Virtual Machines for Mapreduce Application

GCC '10 Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing
Enhancement of Xen's scheduler for MapReduce workloads

Proceedings of the 20th international symposium on High performance distributed computing
Location-Aware MapReduce in Virtual Cloud

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the proliferation of cloud computing and virtual machine technologies, MapReduce applications are increasingly deployed in clouds to leverage the full potential of cloud computing environments. However, the MapReduce, which is generally used for processing large amount of data, suffers from the I/O virtualization overheads and resource competitions among virtual machines when it is run on virtual clouds. This paper proposes an adaptive data transfer algorithm in virtual MapReduce clusters. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines hosted in the same physical machine. By reconfiguring the block devices, we can easily move files across different virtual machines located at the same physical machine without any network transfers between virtual machines. When the output of each map task is transferred to the reducer, this algorithm adaptively determines an appropriate transfer method between network transfer and block device reconfiguration based on current CPU utilization values and the data size for the transfer. Even in the case of data transfer between virtual machines across multiple physical machines, we can remove the transfer overheads between the virtual machine and the driver domain, which results in reducing the data transfer time and performance effects to other virtual machines in the shuffle phase. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results show that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened.