Software-Implemented Fault Injection Methodology for Design and Validation of System Fault Tolerance

  • Authors:
  • Raphael R. Some;Won S. Kim;Garen Khanoyan;Leslie Callum;Anil Agrawal;John J. Beahan

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: In this paper, we present our experience in developing a methodology and tool at the Jet Propulsion Laboratory (JPL) for Software-Implemented Fault Injection (SWIFI) into a parallel processing supercomputer, which is being designed for use in next generation space exploration missions. The fault injector uses software-based strategies to emulate the effects of radiation-induced transients occurring in the system hardware components. The JPL 's SWIFI tool set called JIFI (JPL 's Implementation of a Fault Injector) is being used, in conjunction with an appropriate system fault model, to evaluate candidate hardware and software fault tolerance architectures, determine the sensitivity of applications to faults and measure the effectiveness of fault detection, isolation, and recovery strategies. JIFI has been validated to inject faults into user-specified CPU registers and memory regions with a uniform random distribution in location and time. Together with verifiers, classifiers, and run scripts, JIFI enables massive fault injection campaigns and statistical data analysis.