Automatic exploration of datacenter performance regimes

Authors:
Peter Bodik;Rean Griffith;Charles Sutton;Armando Fox;Michael I. Jordan;David A. Patterson
Affiliations:
University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA
Venue:
ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
Year:
2009

Citing 11
Cited 9

Managing energy and server resources in hosting centers

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Dynamic Provisioning of Multi-tier Internet Applications

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Resource Allocation for Autonomic Data Centers using Analytic Performance Models

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
All of Nonparametric Statistics (Springer Texts in Statistics)

All of Nonparametric Statistics (Springer Texts in Statistics)
Performance modeling and system management for multi-component online services

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Cutting corners: workbench automation for server benchmarking

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Power and Performance Management of Virtualized Computing Environments Via Lookahead Control

ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
The Art of Capacity Planning: Scaling Web Resources

The Art of Capacity Planning: Scaling Web Resources
A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Automated experiment-driven management of (database) systems

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems

Semantic-less coordination of power management and application performance

ACM SIGOPS Operating Systems Review
Reflective control for an elastic cloud application: an automated experiment workbench

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
The SCADS director: scaling a distributed storage system under stringent performance requirements

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
More intervention now!

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
scc: cluster storage provisioning informed by application characteristics and SLAs

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Modeling cloud performance with kriging

Proceedings of the 34th International Conference on Software Engineering
A Pluggable Autoscaling Service for Open Cloud PaaS Systems

UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Rapid experimentation for testing and tuning a production database deployment

Proceedings of the 16th International Conference on Extending Database Technology
A cost effective cloud data centre capacity planning method based on modality cost analysis

International Journal of Communication Networks and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Horizontally scalable Internet services present an opportunity to use automatic resource allocation strategies for system management in the datacenter. In most of the previous work, a controller employs a performance model of the system to make decisions about the optimal allocation of resources. However, these models are usually trained offline or on a small-scale deployment and will not accurately capture the performance of the controlled application. To achieve accurate control of the web application, the models need to be trained directly on the production system and adapted to changes in workload and performance of the application. In this paper we propose to train the performance model using an exploration policy that quickly collects data from different performance regimes of the application. The goal of our approach for managing the exploration process is to strike a balance between not violating the performance SLAs and the need to collect sufficient data to train an accurate performance model, which requires pushing the system close to its capacity. We show that by using our exploration policy, we can train a performance model of a Web 2.0 application in less than an hour and then immediately use the model in a resource allocation controller.