Adaptive Disk I/O Scheduling for MapReduce in Virtualized Environment

Authors:
Shadi Ibrahim;Hai Jin;Lu Lu;Bingsheng He;Song Wu
Affiliations:
-;-;-;-;-
Venue:
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Year:
2011

Citing 0
Cited 7

Efficient Disk I/O Scheduling with QoS Guarantee for Xen-based Hosting Platforms

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Maestro: Replica-Aware Map Scheduling for MapReduce

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
MROrder: flexible job ordering optimization for online mapreduce workloads

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Black box scheduling for resource intensive virtual machine workloads with interference models

Future Generation Computer Systems
Flubber: Two-level disk scheduling in virtualized environment

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Virtual machine (VM) interference has long been a challenging problem for performance predictability and system throughput for large-scale virtualized environments in the cloud. Such interferences are contributed by intertwined factors including the application's type, the number of concurrent VMs, and the VM scheduling algorithms used within the host. Since MapReduce has become an important data processing platform in the cloud, we investigate the impact of disk schedulers in Hadoop. Interestingly, our experimental results report a noticeable variation of the Hadoop performance between different applications when applying different disk pairs' schedulers in both the hyper visor and the virtual machines. Furthermore, a typical Hadoop application consists of different interleaving stages, each requiring different I/O workloads and patterns. As a result, the disk pairs' schedulers are not only sub-optimal for different MapReduce applications, but also sub-optimal for different sub-phases of the whole job. Accordingly, this paper presents a novel approach for adaptively tuning the disk pairs' schedulers in both the hyper visor and the virtual machines during the execution of a single MapReduce job. Our results show that MapReduce performance can be signi聞0虏3cantly improved, speci聞0虏3cally, adaptive tuning of disk pairs' schedulers achieves a 25% performance improvement on a sort benchmark with Hadoop.