Current research and practice in proactive fault management

  • Authors:
  • Y. Li;Z. Lan

  • Affiliations:
  • Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL

  • Venue:
  • International Journal of Computers and Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unlike rollback-recovery, proactive fault management takes preventive actions before the occurrence of failures. In this survey paper, we classify the current research of proactive fault management into two broad categories: failure analysis and prediction, and proactive techniques. Analytical methods have been widely used to analyse and forecast contiguous values, while data mining or machine learning methods are mostly suited to categorical data. Various proactive fault management systems have been recently developed, each of them exploring different proactive techniques to achieve its specific design goal. Our investigation shows that research should be conducted in the context of high performance computing to enable efficient proactive fault management for the emerging large-scale supercomputers.