The Vision of Autonomic Computing
Computer
An analytical model for multi-tier internet services and its applications
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A comparative study of pairwise regression techniques for problem determination
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Information-theoretic modeling for tracking the health of complex software systems
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Heteroscedastic models to track relationships between management metrics
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
High speed and robust event correlation
IEEE Communications Magazine
Detecting application-level failures in component-based Internet services
IEEE Transactions on Neural Networks
Leveraging many simple statistical models to adaptively monitor software systems
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Sampling-based program execution monitoring
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
On the use of computational geometry to detect software faults at runtime
Proceedings of the 7th international conference on Autonomic computing
Light-weight black-box failure detection for distributed systems
Proceedings of the 2012 workshop on Management of big data systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
Hi-index | 0.00 |
Correlations among management metrics in software systems allow errors to be detected and their cause localized. Prior research shows that linear models can capture many of these correlations. However, our research shows that several factors may prevent linear models from accurately describing correlations, even if the underlying relationship is linear. Two common phenomena we have observed are relationships that evolve, typically with time, and heterogeneous variance of the correlated metrics. Two-variable linear models proposed thus far fail to capture these phenomena, and thus fail to describe system dynamics correctly. Often, these phenomena are caused by a missing variable. However, searching for three-variable correlations is O(n3) for n metrics, which is costly for systems with many metrics. In this paper we address the above challenges by improving on two-variable Ordinary Least Squares regression models. We validate our models using a realistic Java-Enterprise-Edition application. Using fault-injection experiments we show that our improved models capture system behavior accurately. We detect errors within 8 sample periods on average from the injection of the fault, which is less than half the time required by the current linear-model approach.