Leveraging many simple statistical models to adaptively monitor software systems

Authors:
Mohammad A. Munawar;Paul A. S. Ward
Affiliations:
Department of Electrical and Computer Engineering, University of Waterloo, 200 University Avenue, Waterloo, Ontario, N2L 3G1, Canada.;Department of Electrical and Computer Engineering, University of Waterloo, 200 University Avenue, Waterloo, Ontario, N2L 3G1, Canada
Venue:
International Journal of High Performance Computing and Networking
Year:
2011

Citing 13
Cited 0

httperf—a tool for measuring web server performance

ACM SIGMETRICS Performance Evaluation Review
Java Management Extensions

Java Management Extensions
The Vision of Autonomic Computing

Computer
Profiling Java applications using code hotswapping and dynamic call graph revelation

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems

IEEE Transactions on Dependable and Secure Computing
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems

IEEE Transactions on Knowledge and Data Engineering
A comparative study of pairwise regression techniques for problem determination

CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
QMON: QoS- and Utility-Aware Monitoring in Enterprise Systems

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Discovering Likely Invariants of Distributed Transaction Systems for Autonomic System Management

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Detecting application-level failures in component-based Internet services

IEEE Transactions on Neural Networks
Adaptive diagnosis in distributed systems

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ensuring that a software system meets its objectives requires continuous monitoring. In practice, monitoring is either insufficient to effectively detect and diagnose failures, or is too costly to use in production. An alternative is adaptive monitoring, where the system is monitored at a minimal level to determine system health, and if a problem is suspected, the monitoring level is automatically increased to determine faults. To model the system at different monitoring levels, we employ statistical techniques to identify stable relationships in the monitored data. These relationships characterise normal operation and can help detect anomalies. We describe our approach in the context of a J2EE-based system. We show that adaptive monitoring is a cost-effective alternative to continuous detailed monitoring. We inject 29 different faults, and show that we detect the faults in 80% of cases and shortlist the faulty component in 65% of the detected cases.