Configuration debugging as search: finding the needle in the haystack

  • Authors:
  • Andrew Whitaker;Richard S. Cox;Steven D. Gribble

  • Affiliations:
  • University of Washington;University of Washington;University of Washington

  • Venue:
  • OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work addresses the problem of diagnosing configuration errors that cause a system to function incorrectly. For example, a change to the local firewall policy could cause a network-based application to malfunction. Our approach is based on searching across time for the instant the system transitioned into a failed state. Based on this information, a troubleshooter or administrator can deduce the cause of failure by comparing system state before and after the failure. We present the Chronus tool, which automates the task of searching for a failure-inducing state change. Chronus takes as input a user-provided software probe, which differentiates between working and nonworking states. Chronus performs "time travel" by booting a virtual machine off the system's disk state as it existed at some point in the past. By using binary search, Chronus can find the fault point with effort that grows logarithmically with log size. We demonstrate that Chronus can diagnose a range of common configuration errors for both client-side and server-side applications, and that the performance overhead of the tool is not prohibitive.