Managing energy and server resources in hosting centers
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Minerva: An automated resource provisioning tool for large-scale storage systems
ACM Transactions on Computer Systems (TOCS)
Machine Learning
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Integrated resource management for cluster-based internet services
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An analytical model for multi-tier internet services and its applications
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
I/O system performance debugging using model-driven anomaly characterization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Performance modeling and system management for multi-component online services
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Model-based resource provisioning in a web service utility
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Detecting performance anomalies in global applications
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Hi-index | 0.00 |
Distributed server systems allow many configuration settings and support various application workloads. Often performance anomalies, situations where actual performance falls below expectations, only manifest under particular runtime conditions. This paper presents a new approach to examine a large space of potential runtime conditions and to comprehensively depict the conditions under which performance anomalies are likely to occur. In our approach, we derive our performance expectations from a hierarchy of sub-models in which each submodel can be independently adjusted to consider new runtime conditions. We then produce a representative set of measured runtime condition samples (both normal and abnormal)with carefully chosen sample size and anomaly error threshold. Finally, we employ decision tree based classification to produce an easy-to-interpret depiction of the entire space of potential runtime conditions. Our depictions can be used to guide the avoidance of anomaly-inducing system configurations and it can also assist root cause diagnosis and performance debugging. We present preliminary experimental results with a real J2EE middleware system.