Improving scheduling performance using a q-learning-based leasing policy for clouds

Authors:
Alexander Fölling;Matthias Hofmann
Affiliations:
Robotics Research Institute, TU Dortmund University, Dortmund, Germany;D-Grid GmbH, Dortmund, Germany
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Year:
2012

Citing 17
Cited 1

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Dynamic Virtual Clusters in a Grid Site Manager

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Resource Allocation in the Grid Using Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Virtual Clusters on the Fly - Fast, Scalable, and Flexible Installation

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
The cost of doing science on the cloud: the Montage example

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A System for Dynamic Server Allocation in Application Server Clusters

ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters

Proceedings of the 18th ACM international symposium on High performance distributed computing
VCONF: a reinforcement learning approach to virtual machines auto-configuration

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Dynamic Grid Resource Scheduling Model Using Learning Agent

NAS '09 Proceedings of the 2009 IEEE International Conference on Networking, Architecture, and Storage
A reinforcement learning approach to dynamic resource allocation

A reinforcement learning approach to dynamic resource allocation
Elastic Site: Using Clouds to Elastically Extend Site Resources

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Robust Load Delegation in Service Grid Environments

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing

IEEE Transactions on Parallel and Distributed Systems
Cost-Wait Trade-Offs in Client-Side Resource Provisioning with Elastic Clouds

CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Connecting Community-Grids by supporting job negotiation with coevolutionary Fuzzy-Systems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary Fuzzy Systems
Scientific Computing in the Cloud

Computing in Science and Engineering

Scheduling jobs in the cloud using on-demand and reserved instances

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Academic data centers are commonly used to solve the major amount of scientific computing. Depending on upcoming research projects the user generated workload may change. Especially in phases of high computational demand it may be useful to temporarily extend the local site. This can be done by leasing computing resources from a cloud computing provider, e.g. Amazon EC2, to improve the service for the local user community. We present a reinforcement learning-based policy which controls the maximum leasing size with regard to the current resource/workload state and the balance between scheduling benefits and costs in an online adaptive fashion. Further, we provide an appropriate model to evaluate such policies and present heuristics to determine upper and lower reference values for the performance evaluation under the given model. Using event driven simulation and real workload traces, we are able to investigate the dynamics of the learning policy and to demonstrate the adaptivity on workload changes. By showing its performance as a ratio between costs and scheduling improvement with regard to the upper and lower reference heuristics we prove the benefit of our concept.