The Vision of Autonomic Computing
Computer
An analytical model for multi-tier internet services and its applications
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A comparative study of pairwise regression techniques for problem determination
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Information-theoretic modeling for tracking the health of complex software systems
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
High speed and robust event correlation
IEEE Communications Magazine
Detecting application-level failures in component-based Internet services
IEEE Transactions on Neural Networks
Leveraging many simple statistical models to adaptively monitor software systems
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
System monitoring with metric-correlation models: problems and solutions
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Hi-index | 0.00 |
Modern software systems expose management metrics to help track their health. Recently, it was demonstrated that correlations among these metrics allow faults to be detected and their causes localized. In particular, linear regression models have been used to capture metric correlations. We show that for many pairs of correlated metrics in software systems, such as those based on Java Enterprise Edition (JavaEE), the variance of the predicted variable is not constant. This behaviour violates the assumptions of linear regression, and we show that these models may produce inaccurate results. In this paper, leveraging insight from the system behaviour, we employ an efficient variant of linear regression to capture the non-constant variance. We show that this variant captures metric correlations, while taking the changing residual variance into consideration. We explore potential causes underlying this behaviour, and we construct and validate our models using a realistic multi-tier enterprise application. Using a set of 50 fault-injection experiments, we show that we can detect all faults without any false alarm.