DEPEND: A Simulation-Based Environment for System Level Dependability Analysis

  • Authors:
  • Kumar K. Goswami;Ravishankar K. Iyer;Luke Young

  • Affiliations:
  • OnLive! Technologies, Cupertino, CA;Univ. of Illinois, Urbana;Tandem Computers, Austin, TX

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1997

Quantified Score

Hi-index 14.99

Visualization

Abstract

The paper presents the rationale for a functional simulation tool, called DEPEND, which provides an integrated design and fault injection environment for system level dependability analysis. The paper discusses the issues and problems of developing such a tool, and describes how DEPEND tackles them. Techniques developed to simulate realistic fault scenarios, reduce simulation time explosion, and handle the large fault model and component domain associated with system level analysis are presented. Examples are used to motivate and illustrate the benefits of this tool. To further illustrate its capabilities, DEPEND is used to simulate the Unix-based Tandem triple-modular-redundancy (TMR) based prototype fault-tolerant system and evaluate how well it handles near-coincident errors caused by correlated and latent faults. Issues such as memory scrubbing, re-integration policies, and workload dependent repair times, which affect how the system handles near-coincident errors, are also evaluated. Unlike any other simulation-based dependability studies, the accuracy of the simulation model is validated by comparing the results of the simulations with measurements obtained from fault injection experiments conducted on a production Tandem machine.