Proactive Fault Handling for System Availability Enhancement

  • Authors:
  • Felix Salfner;Miroslaw Malek

  • Affiliations:
  • Humboldt University Berlin, Germany;Humboldt University Berlin, Germany

  • Venue:
  • IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Proactive fault handling combines prevention and repair actions with failure prediction techniques. We extend the standard availability formula by five key measures: (1) precision and (2) recall assess failure prediction while failure handling is gauged by (3) prevention probability, (4) repair time improvement, and (5) risk of introducing additional failures. We give a short survey of actions that are suited to be combined with failure prediction and provide a procedure to estimate the five key measures. Altogether, this allows to quantify the impact of proactive fault handling on system availability and may provide valuable input for system design.