The Vision of Autonomic Computing
Computer
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A comparative study of pairwise regression techniques for problem determination
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Leveraging many simple statistical models to adaptively monitor software systems
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
System monitoring with metric-correlation models: problems and solutions
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Heteroscedastic models to track relationships between management metrics
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Diagnosis of recurrent faults using log files
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
Stable correlation models are effective in detecting errors in complex software systems. However, most studies assume a specific mathematical form, typically linear, for the underlying correlations. In practice, more complex non-linear relationships exist between metrics. Moreover, most inter-metric correlations form clusters rather than simple pairwise correlations. These clusters provide additional information for error detection and offer the possibility for optimization. We address these issues by adopting the Normalized Mutual Information as a similarity measure. We also employ the entropy of metrics in clusters to monitor system state. Our approach does not require learning specific correlation models, thus reducing computation overhead. We have implemented the proposed approach and show, through experiments with a multi-tier enterprise software system, that it is effective. Our evaluation shows that (i) stable non-linear correlations exist in practice; (ii) the entropy of system metrics in clusters can efficiently detect anomalies caused by faults and provide information for diagnosis; and (iii) we can detect errors which were not captured by previous linear-correlation approaches.