vPerfGuard: an automated model-driven framework for application performance diagnosis in consolidated cloud environments

  • Authors:
  • Pengcheng Xiong;Calton Pu;Xiaoyun Zhu;Rean Griffith

  • Affiliations:
  • Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;VMware Inc., Palo Alto, CA, USA;VMware Inc., Palo Alto, CA, USA

  • Venue:
  • Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many business customers hesitate to move all their applications to the cloud due to performance concerns. White-box diagnosis relies on human expert experience or performance troubleshooting "cookbooks" to find potential performance bottlenecks. Despite wide adoption, the scalability and adaptivity of such approaches remain severely constrained, especially in a highly-dynamic, consolidated cloud environment. Leveraging the rich telemetry collected from applications and systems in the cloud, and the power of statistical learning, vPerfGuard complements the existing approaches with a model-driven framework by: (1) automatically identifying system metrics that are most predictive of application performance, and (2) adaptively detecting changes in the performance and potential shifts in the predictive metrics that may accompany such a change. Although correlation does not imply causation, the predictive system metrics point to potential causes that can guide a cloud service provider to zero in on the root cause. We have implemented vPerfGuard as a combination of three modules: a sensor module, a model building module, and a model updating module. We evaluate its effectiveness using different benchmarks and different workload types, specifically focusing on various resource (CPU, memory, disk I/O) contention scenarios that are caused by workload surges or "noisy neighbors". The results show that vPerfGuard automatically points to the correct performance bottleneck in each scenario, including the type of the contended resource and the host where the contention occurred.