Comprehensive depiction of configuration-dependent performance anomalies in distributed server systems

Authors:
Christopher Stewart;Ming Zhong;Kai Shen;Thomas O'Neill
Affiliations:
Department of Computer Science, University of Rochester;Department of Computer Science, University of Rochester;Department of Computer Science, University of Rochester;Department of Computer Science, University of Rochester
Venue:
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
Year:
2006

Citing 16
Cited 0

Managing energy and server resources in hosting centers

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Minerva: An automated resource provisioning tool for large-scale storage systems

ACM Transactions on Computer Systems (TOCS)
Induction of Decision Trees

Machine Learning
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Integrated resource management for cluster-based internet services

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An analytical model for multi-tier internet services and its applications

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Capturing, indexing, clustering, and retrieving system history

Proceedings of the twentieth ACM symposium on Operating systems principles
I/O system performance debugging using model-driven anomaly characterization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Performance modeling and system management for multi-component online services

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Model-based resource provisioning in a web service utility

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Detecting performance anomalies in global applications

WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed server systems allow many configuration settings and support various application workloads. Often performance anomalies, situations where actual performance falls below expectations, only manifest under particular runtime conditions. This paper presents a new approach to examine a large space of potential runtime conditions and to comprehensively depict the conditions under which performance anomalies are likely to occur. In our approach, we derive our performance expectations from a hierarchy of sub-models in which each submodel can be independently adjusted to consider new runtime conditions. We then produce a representative set of measured runtime condition samples (both normal and abnormal)with carefully chosen sample size and anomaly error threshold. Finally, we employ decision tree based classification to produce an easy-to-interpret depiction of the entire space of potential runtime conditions. Our depictions can be used to guide the avoidance of anomaly-inducing system configurations and it can also assist root cause diagnosis and performance debugging. We present preliminary experimental results with a real J2EE middleware system.