Some relationships between failure detection probability and computer system reliability

  • Authors:
  • Henry Wyle;Gerald J. Burnett

  • Affiliations:
  • Autonetics, Anaheim, California;Autonetics, Anaheim, California

  • Venue:
  • AFIPS '67 (Fall) Proceedings of the November 14-16, 1967, fall joint computer conference
  • Year:
  • 1967

Quantified Score

Hi-index 0.00

Visualization

Abstract

The relationships between computer failure rates and the failure rates of the modules from which the computers are constructed are well known. The analytical techniques permitting derivation of one parameter from the other for a given design are in widespread use. With the increasing interest in ultra-reliable computer systems various approaches to increasing reliability through the use of redundancy have been proposed and in some cases implemented. A feature common to many of these approaches is the inclusion of on-line spare modules (either used or idle) with provision in the computer system for automatic replacement of a failed module by an on-line spare. Systems of the "graceful degradation" type generally fall into this class, as well as some "self-repairing" systems and other system types. Such systems are basically self-reconfiguring. Using modules of ordinary failure rates they are theoretically capable of astronomically high reliabilities. There is to data, however, a good deal of reserve on the part of the general computing community about accepting these reliabilities at face value since theory and practice are usually separated from each other by a host of difficult and sometimes ill-defined problems.