Addressing software dependability with statistical and machine learning techniques

  • Authors:
  • Armando Fox

  • Affiliations:
  • Stanford University, Stanford, CA

  • Venue:
  • Proceedings of the 27th international conference on Software engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our ability to design and deploy large complex systems is outpacing our ability to understand their behavior. How do we detect and recover from "heisenbugs," which account for up to 40% of failures in complex Internet systems, without extensive application-specific coding? Which users were affected, and for how long? How do we diagnose and correct problems caused by configuration errors or operator errors? Although these problems are posed at a high level of abstraction, all we can usually measure directly are low-level behaviors---analogous to driving a car while looking through a magnifying glass. Machine learning can bridge this gap using techniques that learn "baseline" models automatically or semi-automatically, allowing the characterization and monitoring of systems whose structure is not well understood a priori. I'll discuss initial successes and future challenges in using machine learning for failure detection anbd diagnosis, configuration troubleshooting, attribution (which low-level properties appear to be correlated with an observed high-level effect such as decreased performance), and failure forecasting.