Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An analytical model for multi-tier internet services and its applications
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
IP fault localization via risk modeling
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Regression-Based Analytic Model for Dynamic Resource Provisioning of Multi-Tier Applications
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Exploiting nonstationarity for performance prediction
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Automated control of multiple virtualized resources
Proceedings of the 4th ACM European conference on Computer systems
Fingerprinting the datacenter: automated classification of performance crises
Proceedings of the 5th European conference on Computer systems
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
A workload characterization study of the 1998 World Cup Web site
IEEE Network: The Magazine of Global Internetworking
Hi-index | 0.00 |
As cloud service providers leverage server virtualization to host applications in virtual machines (VMs), they must ensure proper allocation of resource capacities in order to satisfy the contracted service level agreements (SLAs) with the application owners. However, the ever-growing number of virtual and physical machines within such infrastructure creates greater challenges in quickly and effectively localizing the system bottlenecks that lead to SLA violations. This paper describes DAPA, a new performance diagnostic framework to help system administrators analyze application performance anomalies and identify potential causes of SLA violations. DAPA incorporates several customized statistical techniques to capture the quantitative relationship between the application performance and virtualized system metrics. We have built a prototype implementation of DAPA on a cluster of virtualized systems to diagnose a set of SLA violations for an enterprise application. Preliminary evaluation results show that DAPA is able to localize the most suspicious attributes of the virtual machines and physical hosts that are related to the SLA violations.