How to keep your head above water while detecting errors

Authors:
Ignacio Laguna;Fahad A. Arshad;David M. Grothe;Saurabh Bagchi
Affiliations:
Purdue University;Purdue University;Purdue University;Purdue University
Venue:
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Year:
2009

Citing 13
Cited 2

Time series: theory and methods

Time series: theory and methods
Stateful Intrusion Detection for High-Speed Networks

SP '02 Proceedings of the 2002 IEEE Symposium on Security and Privacy
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Sketch-based change detection: methods, evaluation, and applications

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Parsing with treebank grammars: empirical bounds, theoretical models, and the structure of the Penn Treebank

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Mining anomalies using traffic feature distributions

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
WAP5: black-box performance debugging for wide-area systems

Proceedings of the 15th international conference on World Wide Web
Automated Online Monitoring of Distributed Applications through External Monitors

IEEE Transactions on Dependable and Secure Computing
Is sampled data sufficient for anomaly detection?

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Stateful Detection in High Throughput Distributed Systems

SRDS '07 Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems
Detecting application-level failures in component-based Internet services

IEEE Transactions on Neural Networks

Stateful error detection in high throughput applications

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Towards IT systems capable of managing their health

FOCS'10 Proceedings of the 16th Monterey conference on Foundations of computer software: modeling, development, and verification of adaptive systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's distributed systems need runtime error detection to catch errors arising from software bugs, hardware errors, or unexpected operating conditions. A prominent class of error detection techniques operates in a stateful manner, i.e., it keeps track of the state of the application being monitored and then matches state-based rules. Large-scale distributed applications generate a high volume of messages that can overwhelm the capacity of a stateful detection system. An existing approach to handle this is to randomly sample the messages and process a subset. However, this approach, leads to non-determinism with respect to the detection system's view of what state the application is in. This in turn leads to degradation in the quality of detection. We present an intelligent sampling algorithm and a Hidden Markov Model (HMM)-based algorithm to select the messages that the detection system processes and determine the application states such that the non-determinism is minimized. We also present a mechanism for selectively triggering computationally intensive rules based on a light-weight mechanism to determine if the rule is likely to be flagged. We demonstrate the techniques in a detection system called Monitor applied to a J2EE multi-tier application. We empirically evaluate the performance of Monitor under different load conditions and error scenarios and compare it to a previous system called Pinpoint.