Machine Learning
Implementing a Generalized Tool for Network Monitoring
LISA '97 Proceedings of the 11th Conference on Systems Administration
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Proactive Detection of Software Aging Mechanisms in Performance Critical Computers
SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Research challenges of autonomic computing
Proceedings of the 27th international conference on Software engineering
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
NAP: a building block for remediating performance bottlenecks via black box network analysis
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Using virtualization for high availability and disaster recovery
IBM Journal of Research and Development
OS-level hang detection in complex software systems
International Journal of Critical Computer-Based Systems
Hi-index | 0.01 |
What do our computer systems do all day? How do we make sure they continue doing it when failures occur? Traditional approaches to answering these questions often involve in-band monitoring agents. However in-band agents suffer from several drawbacks: they need to be written or customized for every workload (operating system and possibly also application), they comprise potential security liabilities, and are themselves affected by adverse conditions in the monitored systems. Virtualization technology makes it possible to encapsulate an entire operating system or application instance within a virtual object that can then be easily monitored and manipulated without any knowledge of the contents or behavior of that object. This can be done out-of-band, using general purpose agents that do not reside inside the object, and hence are not affected by the behavior of the object. This paper describes Vigilant, a novel way of monitoring virtual machines for problems. Vigilant requires no specialized agents inside a virtual object it is monitoring. Instead, it uses the hypervisor to directly monitor the resource requests and utilization of an object. Machine learning methods are then used to analyze the readings. Our experimental results show that problems can be detected out-of-band with high accuracy. Using Vigilant we demonstrate that out-of-band monitoring using virtualization and machine learning can accurately identify faults in the guest OS, while avoiding the many pitfalls associated with in-band monitoring.