Structure from failure

  • Authors:
  • Ralf Herbrich;Thore Graepel;Brendan Murphy

  • Affiliations:
  • Microsoft Research Ltd., Cambridge, UK;Microsoft Research Ltd., Cambridge, UK;Microsoft Research Ltd., Cambridge, UK

  • Venue:
  • SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the problem of learning the dependencies among servers in large networks based on failure patterns in their up-time behaviour. We model up-times in terms of exponential distributions whose inverse lifetime parameters lmay vary with the state of other servers. Based on a conjugate Gamma prior over inverse lifetimes we identify the most likely network configuration given that any node has at most one parent. The method can be viewed as a special case of learning a continuous time Bayesian network. Our approach enables us to easily incorporate existing expert prior knowledge. Furthermore our method enjoys advantages over a state-of-the-art rule-based approach. We validate the approach on synthetic data and apply it to five year data for a set of over 500 servers at a server farm of a major Microsoft web site.