Design and Testing of a Generalized Reduced Gradient Code for Nonlinear Programming
ACM Transactions on Mathematical Software (TOMS)
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Diagnosing performance overheads in the xen virtual machine environment
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Using Regression Techniques to Predict Large Data Transfers
International Journal of High Performance Computing Applications
Measuring CPU overhead for I/O processing in the Xen virtual machine monitor
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Evaluating MapReduce on Virtual Machines: The Hadoop Case
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Evaluating I/O Scheduler in Virtual Machines for Mapreduce Application
GCC '10 Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing
Enhancement of Xen's scheduler for MapReduce workloads
Proceedings of the 20th international symposium on High performance distributed computing
Location-Aware MapReduce in Virtual Cloud
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Hi-index | 0.00 |
With the proliferation of cloud computing and virtual machine technologies, MapReduce applications are increasingly deployed in clouds to leverage the full potential of cloud computing environments. However, the MapReduce, which is generally used for processing large amount of data, suffers from the I/O virtualization overheads and resource competitions among virtual machines when it is run on virtual clouds. This paper proposes an adaptive data transfer algorithm in virtual MapReduce clusters. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines hosted in the same physical machine. By reconfiguring the block devices, we can easily move files across different virtual machines located at the same physical machine without any network transfers between virtual machines. When the output of each map task is transferred to the reducer, this algorithm adaptively determines an appropriate transfer method between network transfer and block device reconfiguration based on current CPU utilization values and the data size for the transfer. Even in the case of data transfer between virtual machines across multiple physical machines, we can remove the transfer overheads between the virtual machine and the driver domain, which results in reducing the data transfer time and performance effects to other virtual machines in the shuffle phase. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results show that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened.