Internet service performance failure detection

Authors:
Amy Ward;Peter Glynn;Kathy Richardson
Affiliations:
Engineering Economic Systems & Operations Research Department, Stanford University, Stanford, CA;Engineering Economic Systems & Operations Research Department, Stanford University, Stanford, CA;Western Research Labs, Digital Equipment Corporation, Palo Alto, CA
Venue:
ACM SIGMETRICS Performance Evaluation Review
Year:
1998

Citing 0
Cited 13

A signal analysis of network traffic anomalies

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Learning response time for WebSources using query feedback and application in query optimization

The VLDB Journal — The International Journal on Very Large Data Bases
Sketch-based change detection: methods, evaluation, and applications

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
IP forwarding anomalies and improving their detection using multiple data sources

Proceedings of the ACM SIGCOMM workshop on Network troubleshooting: research, theory and operations practice meet malfunctioning reality
Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Aberrant Behavior Detection in Time Series for Network Monitoring

LISA '00 Proceedings of the 14th USENIX conference on System administration
Diagnosis of TCP overlay connection failures using bayesian networks

Proceedings of the 2006 SIGCOMM workshop on Mining network data
Network anomography

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Failure Detection in Large-Scale Internet Services by Principal Subspace Mapping

IEEE Transactions on Knowledge and Data Engineering
A survey of online failure prediction methods

ACM Computing Surveys (CSUR)
Optimal volume anomaly detection and isolation in large-scale IP networks using coarse-grained measurements

Computer Networks: The International Journal of Computer and Telecommunications Networking
Automatic location detection system for anomaly traffic on wired/wireless networks

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing complexity of computer networks and our increasing dependence on them means enforcing reliability requirements is both more challenging and more critical. The expansion of network services to include both traditional interconnect services and user-oriented services such as the web and email has guaranteed both the increased complexity of networks and the increased importance of their performance. The first step toward increasing reliability is early detection of network performance failures. Here we consider the applicability of statistical model frameworks under the most general assumptions possible. Using measurements from corporate proxy servers, we test the framework against real world failures. The results of these experiments show we can detect failures, but with some tradeoff questions. The pull is in the warning time: either we miss early warning signs or we report some false warnings. Finally, we offer insight into the problem of failure diagnosis.