Computer system performance problem detection using time series models

Authors:
Peter Hoogenboom;Jay Lepreau
Affiliations:
University of Utah;University of Utah
Venue:
Usenix-stc'93 Proceedings of the USENIX Summer 1993 Technical Conference on Summer technical conference - Volume 1
Year:
1993

Citing 4
Cited 8

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
A guide to expert systems

A guide to expert systems
Algorithms

Algorithms
Knowledge-based monitoring and control: an approach to understanding behavior of TCP/IP network protocols

SIGCOMM '88 Symposium proceedings on Communications architectures and protocols

Two Dimensional Time-Series for Anomaly Detection and Regulation in Adaptive Systems

DSOM '02 Proceedings of the 13th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Management Technologies for E-Commerce and E-Business Applications
Dynamic dependencies and performance improvement

LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
ICMPv6 Cumulative Path Traceback in Mobile Ad Hoc networks (MANET)

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Application of anomaly detection algorithms for detecting SYN flooding attacks

Computer Communications
A real-time system-adapted anomaly detector

Information Sciences: an International Journal
Multi-site scheduling with multiple job reservations and forecasting methods

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Selective resource characterization for evaluation of system dynamics

ACM SIGMETRICS Performance Evaluation Review
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.01

Visualization

Abstract

Computer systems require monitoring to detect performance anomalies such as runaway processes, but problem detection and diagnosis is a complex task requiring skilled attention. Although human attention was never ideal for this task, as networks of computers grow larger and their interactions more complex, it falls far short. Existing computer-aided management systems require the administrator manually to specify fixed "trouble" thresholds. In this paper we report on an expert system that automatically sets thresholds, and detects and diagnoses performance problems on a network of Unix computers. Key to the success and scalability of this system are the time series models we developed to model the variations in workload on each host. Analysis of the load average records of 50 machines yielded models which show, for workstations with simulated problem injection, false positive and negative rates of less than 1%. The server machines most difficult to model still gave average false positive/negative rates of only 6%/32%. Observed values exceeding the expected range for a particular host cause the expert system to focus on that machine. There it applies tools with finer resolution and more discrimination, including per-command profiles gleaned from process accounting records. It makes one of 18 specific diagnoses and notifies the administrator, and optionally the user [a].