Vigilant: out-of-band detection of failures in virtual machines

Authors:
Dan Pelleg;Muli Ben-Yehuda;Rick Harper;Lisa Spainhower;Tokunbo Adeshiyan
Affiliations:
-;-;-;-;-
Venue:
ACM SIGOPS Operating Systems Review
Year:
2008

Citing 9
Cited 4

Machine Learning

Machine Learning
Implementing a Generalized Tool for Network Monitoring

LISA '97 Proceedings of the 11th Conference on Systems Administration
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Advanced Pattern Recognition for Detection of Complex Software Aging Phenomena in Online Transaction Processing Servers

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Proactive Detection of Software Aging Mechanisms in Performance Critical Computers

SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Research challenges of autonomic computing

Proceedings of the 27th international conference on Software engineering
QEMU, a fast and portable dynamic translator

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference

Performance comparison of two virtual machine scenarios using an HPC application: a case study using molecular dynamics simulations

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
NAP: a building block for remediating performance bottlenecks via black box network analysis

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Using virtualization for high availability and disaster recovery

IBM Journal of Research and Development
OS-level hang detection in complex software systems

International Journal of Critical Computer-Based Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

What do our computer systems do all day? How do we make sure they continue doing it when failures occur? Traditional approaches to answering these questions often involve in-band monitoring agents. However in-band agents suffer from several drawbacks: they need to be written or customized for every workload (operating system and possibly also application), they comprise potential security liabilities, and are themselves affected by adverse conditions in the monitored systems. Virtualization technology makes it possible to encapsulate an entire operating system or application instance within a virtual object that can then be easily monitored and manipulated without any knowledge of the contents or behavior of that object. This can be done out-of-band, using general purpose agents that do not reside inside the object, and hence are not affected by the behavior of the object. This paper describes Vigilant, a novel way of monitoring virtual machines for problems. Vigilant requires no specialized agents inside a virtual object it is monitoring. Instead, it uses the hypervisor to directly monitor the resource requests and utilization of an object. Machine learning methods are then used to analyze the readings. Our experimental results show that problems can be detected out-of-band with high accuracy. Using Vigilant we demonstrate that out-of-band monitoring using virtualization and machine learning can accurately identify faults in the guest OS, while avoiding the many pitfalls associated with in-band monitoring.