httperf—a tool for measuring web server performance
ACM SIGMETRICS Performance Evaluation Review
Java Management Extensions
The Vision of Autonomic Computing
Computer
Profiling Java applications using code hotswapping and dynamic call graph revelation
WOSP '04 Proceedings of the 4th international workshop on Software and performance
Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems
IEEE Transactions on Knowledge and Data Engineering
A comparative study of pairwise regression techniques for problem determination
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
QMON: QoS- and Utility-Aware Monitoring in Enterprise Systems
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Discovering Likely Invariants of Distributed Transaction Systems for Autonomic System Management
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Detecting application-level failures in component-based Internet services
IEEE Transactions on Neural Networks
Adaptive diagnosis in distributed systems
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Ensuring that a software system meets its objectives requires continuous monitoring. In practice, monitoring is either insufficient to effectively detect and diagnose failures, or is too costly to use in production. An alternative is adaptive monitoring, where the system is monitored at a minimal level to determine system health, and if a problem is suspected, the monitoring level is automatically increased to determine faults. To model the system at different monitoring levels, we employ statistical techniques to identify stable relationships in the monitored data. These relationships characterise normal operation and can help detect anomalies. We describe our approach in the context of a J2EE-based system. We show that adaptive monitoring is a cost-effective alternative to continuous detailed monitoring. We inject 29 different faults, and show that we detect the faults in 80% of cases and shortlist the faulty component in 65% of the detected cases.