Designing a resourceful fault-tolerance system

  • Authors:
  • Ray Giguette;Johnette Hassell

  • Affiliations:
  • Computer Science Department, Nicholls State University, P. O. Box 2168, Thibodaux, LA;Electrical Engineering and Computer Science Department, Tulane University, New Orleans, LA

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper examines the feasibility of creating a "resourceful" software fault-tolerance system. Current fault-tolerant methods typically replace a faulty module with a redundant backup version, making no attempt to assess and correct errors in the original module. Error-recovery options are therefore limited by the number of backup modules. In contrast, a resourceful system dynamically generates alternative error-correction strategies. Periodically, the system determines which of its pre-defined goals has not been met, then executes different strategies until its goals are achieved. We outline a resourceful fault-tolerance system that defines recovery goals and specifies separate detection and correction procedures for each goal. When errors are detected, various sequences of correction procedures are examined to identify ones that meet the recovery goals. Implementation issues such as specifying recovery goals, creating recovery options, and reducing runtime overhead are examined. We describe a strategy to increase the efficiency of our method by planning each recovery before implementing it, eliminating strategies expected to be unsuccessful, impractical, or cyclical.