vPerfGuard: an automated model-driven framework for application performance diagnosis in consolidated cloud environments

Authors:
Pengcheng Xiong;Calton Pu;Xiaoyun Zhu;Rean Griffith
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;VMware Inc., Palo Alto, CA, USA;VMware Inc., Palo Alto, CA, USA
Venue:
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Year:
2013

Citing 17
Cited 0

Detection of abrupt changes: theory and application

Detection of abrupt changes: theory and application
Artificial Intelligence

Artificial Intelligence
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
I/O system performance debugging using model-driven anomaly characterization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Model-based resource provisioning in a web service utility

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Exploiting nonstationarity for performance prediction

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Adaptive control of virtualized resources in utility computing environments

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Fingerprinting the datacenter: automated classification of performance crises

Proceedings of the 5th European conference on Computer systems
Diagnosing performance changes by comparing request flows

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Fay: extensible distributed tracing from kernels to clusters

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems

ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many business customers hesitate to move all their applications to the cloud due to performance concerns. White-box diagnosis relies on human expert experience or performance troubleshooting "cookbooks" to find potential performance bottlenecks. Despite wide adoption, the scalability and adaptivity of such approaches remain severely constrained, especially in a highly-dynamic, consolidated cloud environment. Leveraging the rich telemetry collected from applications and systems in the cloud, and the power of statistical learning, vPerfGuard complements the existing approaches with a model-driven framework by: (1) automatically identifying system metrics that are most predictive of application performance, and (2) adaptively detecting changes in the performance and potential shifts in the predictive metrics that may accompany such a change. Although correlation does not imply causation, the predictive system metrics point to potential causes that can guide a cloud service provider to zero in on the root cause. We have implemented vPerfGuard as a combination of three modules: a sensor module, a model building module, and a model updating module. We evaluate its effectiveness using different benchmarks and different workload types, specifically focusing on various resource (CPU, memory, disk I/O) contention scenarios that are caused by workload surges or "noisy neighbors". The results show that vPerfGuard automatically points to the correct performance bottleneck in each scenario, including the type of the contended resource and the host where the contention occurred.