Vigilant: out-of-band detection of failures in virtual machines

  • Authors:
  • Dan Pelleg;Muli Ben-Yehuda;Rick Harper;Lisa Spainhower;Tokunbo Adeshiyan

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ACM SIGOPS Operating Systems Review
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

What do our computer systems do all day? How do we make sure they continue doing it when failures occur? Traditional approaches to answering these questions often involve in-band monitoring agents. However in-band agents suffer from several drawbacks: they need to be written or customized for every workload (operating system and possibly also application), they comprise potential security liabilities, and are themselves affected by adverse conditions in the monitored systems. Virtualization technology makes it possible to encapsulate an entire operating system or application instance within a virtual object that can then be easily monitored and manipulated without any knowledge of the contents or behavior of that object. This can be done out-of-band, using general purpose agents that do not reside inside the object, and hence are not affected by the behavior of the object. This paper describes Vigilant, a novel way of monitoring virtual machines for problems. Vigilant requires no specialized agents inside a virtual object it is monitoring. Instead, it uses the hypervisor to directly monitor the resource requests and utilization of an object. Machine learning methods are then used to analyze the readings. Our experimental results show that problems can be detected out-of-band with high accuracy. Using Vigilant we demonstrate that out-of-band monitoring using virtualization and machine learning can accurately identify faults in the guest OS, while avoiding the many pitfalls associated with in-band monitoring.