Dependability Analysis of a High-Speed Network Using Software-Implemented Fault Injection and Simulated Fault Injection

  • Authors:
  • David T. Stott;Greg Ries;Mei-Chen Hsueh;Ravishankar K. Iyer

  • Affiliations:
  • Univ. of Illinois at Urbana-Champaign, Urbana;Chromatic Research, Inc., Sunnyvale, CA;Digital Equipment Corp., Marlborough, MA;Univ. of Illinois at Urbana-Champaign, Urbana

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1998

Quantified Score

Hi-index 14.99

Visualization

Abstract

This paper presents a dependability study of high-speed, switched Local Area Networks (LANs) using Myrinet as an example testbed (with theoretical speeds of 2.56 Gbps). The study uses results of two fault injection methods, simulated fault injection and software-implemented fault injection (SWIFI), to analyze the application-level impact of transient faults injected into the network interface hardware. These results include a number of errors, such as dropped or corrupt messages, host interface or host resets, and local or remote host interface hangs. The paper presents the study in two parts: First, the results from the SWIFI method in the real system are used as a basis to validate the simulation and identify the major factors leading to differences between the methods. A comparison between the two injection methods shows that they agree for 83 percent of the fault injections. The results, however, vary greatly, depending on the fault type considered. The study also presents an analysis of the effects of varying workload intensity, host platform, and interface function targeted by the injection. An example of this analysis is to show that the function targeted has a significant impact on the fault activation rate. Finally, the study identifies two mechanisms by which faults may propagate from the interface to other parts of the network; in one example, this propagation caused the interface's host computer to reboot, while another caused a remote interface in the network to hang