Reference-driven performance anomaly identification

Authors:
Kai Shen;Christopher Stewart;Chuanpeng Li;Xin Li
Affiliations:
University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA
Venue:
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Year:
2009

Citing 15
Cited 12

The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Isolating cause-effect chains from computer programs

Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
Managing prefetch memory for data-intensive online servers

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
I/O system performance debugging using model-driven anomaly characterization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Performance modeling and system management for multi-component online services

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Automatic misconfiguration troubleshooting with peerpressure

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Modeling the relative fitness of storage

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Exploiting nonstationarity for performance prediction

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Competitive prefetching for concurrent sequential I/O

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Triage: diagnosing production run failures at the user's site

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Operating system profiling via latency analysis

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Ironmodel: robust performance models in the wild
DARC: dynamic analysis of root causes of latency distributions

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Request behavior variations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems

Proceedings of the 7th international conference on Autonomic computing
Adaptive system anomaly prediction for large-scale hosting infrastructures

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
A realistic evaluation of memory hardware errors and software system susceptibility

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Q-score: proactive service quality assessment in a large IPTV system

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
FIOS: a fair, efficient flash I/O scheduler

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Understanding and detecting real-world performance bugs

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A generic methodology to derive domain-specific performance feedback for developers

Proceedings of the 34th International Conference on Software Engineering
Automated diagnosis without predictability is a recipe for failure

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems

Proceedings of the 9th international conference on Autonomic computing
Comprehending performance from real-world execution traces: a device-driver case

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions--those of similar execution conditions from the target--to help identify the symptoms and causes of performance anomalies. First, to identify anomaly symptoms, we construct change profiles that probabilistically characterize expected performance deviations between target and reference executions. By synthesizing several single-parameter change profiles, we can scalably identify anomalous reference-to-target changes in a complex system with multiple execution parameters. Second, to narrow the scope of anomaly root cause analysis, we filter anomaly-related low-level system metrics as those that manifest very differently between target and reference executions. Our anomaly identification approach requires little expert knowledge or detailed models on system internals and consequently it can be easily deployed. Using empirical case studies on the Linux I/O subsystem and a J2EE-based distributed online service, we demonstrate our approach's effectiveness in identifying performance anomalies over a wide range of execution conditions as well as multiple system software versions. In particular, we discovered five previously unknown performance anomaly causes in the Linux 2.6.23 kernel. Additionally, our preliminary results suggest that online anomaly detection and system reconfiguration may help evade performance anomalies in complex online systems.