MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
VCONF: a reinforcement learning approach to virtual machines auto-configuration
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
A Reinforcement Learning Approach to Online Web Systems Auto-configuration
ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Efficient resource provisioning in compute clouds via VM multiplexing
Proceedings of the 7th international conference on Autonomic computing
Autonomic mix-aware provisioning for non-stationary data center workloads
Proceedings of the 7th international conference on Autonomic computing
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Autonomic Provisioning with Self-Adaptive Neural Fuzzy Control for End-to-end Delay Guarantee
MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Mesos: a platform for fine-grained resource sharing in the data center
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud
IEEE Transactions on Parallel and Distributed Systems
PERFUME: power and performance guarantee with fuzzy MIMO control in virtualized servers
Proceedings of the Nineteenth International Workshop on Quality of Service
A multi-objective approach to virtual machine management in datacenters
Proceedings of the 8th ACM international conference on Autonomic computing
ARIA: automatic resource inference and allocation for mapreduce environments
Proceedings of the 8th ACM international conference on Autonomic computing
Economical and Robust Provisioning of N-Tier Cloud Workloads: A Multi-level Control Approach
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
YSmart: Yet Another SQL-to-MapReduce Translator
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Intelligent Placement of Datacenters for Internet Services
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Location-Aware MapReduce in Virtual Cloud
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
S3: An Efficient Shared Scan Scheduler on MapReduce Framework
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Heterogeneity-aware resource allocation and scheduling in the cloud
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Hi-index | 0.00 |
Distributed data processing framework MapReduce is increasingly deployed in Clouds to leverage the pay-per-usage cloud computing model. Popular Hadoop MapReduce environment expects that end users determine the type and amount of Cloud resources for reservation as well as the configuration of Hadoop parameters. However, such resource reservation and job provisioning decisions require in-depth knowledge of system internals and laborious but often ineffective parameter tuning. We propose and develop AROMA, a system that automates the allocation of heterogeneous Cloud resources and configuration of Hadoop parameters for achieving quality of service goals while minimizing the incurred cost. It addresses the significant challenge of provisioning ad-hoc jobs that have performance deadlines in Clouds through a novel two-phase machine learning and optimization framework. Its technical core is a support vector machine based performance model that enables the integration of various aspects of resource provisioning and auto-configuration of Hadoop jobs. It adapts to ad-hoc jobs by robustly matching their resource utilization signature with previously executed jobs and making provisioning decisions accordingly. We implement AROMA as an automated job provisioning system for Hadoop MapReduce hosted in virtualized HP ProLiant blade servers. Experimental results show AROMA's effectiveness in providing performance guarantee of diverse Hadoop benchmark jobs while minimizing the cost of Cloud resource usage.