The Vision of Autonomic Computing
Computer
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
Making scheduling "cool": temperature-aware workload placement in data centers
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic misconfiguration troubleshooting with peerpressure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A comparative study of pairwise regression techniques for problem determination
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Snitch: interactive decision trees for troubleshooting misconfigurations
SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
Fingerpointing correlated failures in replicated systems
SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
Detecting application-level failures in component-based Internet services
IEEE Transactions on Neural Networks
Leveraging many simple statistical models to adaptively monitor software systems
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
NAP: a building block for remediating performance bottlenecks via black box network analysis
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
A CIM-based framework to manage monitoring adaptability
Proceedings of the 8th International Conference on Network and Service Management
Performance optimization of deployed software-as-a-service applications
Journal of Systems and Software
Hi-index | 0.00 |
To ensure high availability, self-managing systems require self-monitoring and a system model against which to analyze monitoring data. Characterizing relationships between system metrics has been shown to model simple multi-tier transaction systems effectively, enabling failure detection and fault diagnosis. In this paper we show how to extend this invariant metric-relationships approach to clustered multi-tier systems. We show through analysis and experimentation that naive application of the approach increases cost dramatically while reducing diagnosis accuracy. We demonstrate that randomization at the load balancer during the invariant-identification phase will improve diagnosis accuracy, though it neither completely eliminates the problem nor reduces the cost; indeed, it may increase the cost, as this approach will require a long learning phase to remove all accidental correlations. Finally, we argue that knowing the system structure is necessary to effectively apply invariants to the clustered environment.