FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults

  • Authors:
  • Wei-lun Kao;Ravishankar K. Iyer;Dong Tang

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Software Engineering - Special issue on software reliability
  • Year:
  • 1993

Quantified Score

Hi-index 0.01

Visualization

Abstract

The authors present a fault injection and monitoring environment (FINE) as a tool to study fault propagation in the UNIX kernel. FINE injects hardware-induced software errors and software faults into the UNIX kernel and traces the execution flow and key variables of the kernel. FINE consists of a fault injector, a software monitor, a workload generator, a controller, and several analysis utilities. Experiments on SunOS 4.1.2 are conducted by applying FINE to investigate fault propagation and to evaluate the impact of various types of faults. Fault propagation models are built for both hardware and software faults. Transient Markov reward analysis is performed to evaluate the loss of performance due to an injected fault. Experimental results show that memory and software faults usually have a very long latency, while bus and CPU faults tend to crash the system immediately. About half of the detected errors are data faults, which are detected when the system is tries to access an unauthorized memory location. Only about 8% of faults propagate to other UNIX subsystems. Markov reward analysis shows that the performance loss incurred by bus faults and CPU faults is much higher than that incurred by software and memory faults. Among software faults, the impact of pointer faults is higher than that of nonpointer faults.