Using Hidden Semi-Markov Models for Effective Online Failure Prediction

  • Authors:
  • Felix Salfner;Miroslaw Malek

  • Affiliations:
  • Humboldt-Universitat zu Berlin;Humboldt-Universitat zu Berlin

  • Venue:
  • SRDS '07 Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A proactive handling of faults requires that the risk of upcoming failures is continuously assessed. One of the promising approaches is online failure prediction, which means that the current state of the system is evaluated in order to predict the occurrence of failures in the near future. More specifically, we focus on methods that use event-driven sources such as errors. We use Hidden Semi- Markov Models (HSMMs) for this purpose and demonstrate effectiveness based on field data of a commercial telecommunication system. For comparative analysis we selected three well-known failure prediction techniques: a straightforward method that is based on a reliability model, Dispersion Frame Technique by Lin and Siewiorek and the eventset-based method introduced by Vilalta et al. We assess and compare the methods in terms of precision, recall, F-measure, false-positive rate, and computing time. The experiments suggest that our HSMM approach is very effective with respect to online failure prediction.