A Distributed Algorithm for Fault Diagnosis in Systems with Soft Failures

  • Authors:
  • C.-L. Yang;G. M. Masson

  • Affiliations:
  • GTE Labs, Waltham, MA;John Hopkins Univ., Baltimore, MD

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1988

Quantified Score

Hi-index 14.99

Visualization

Abstract

The problem of diagnosis of soft failures at the system level in large and fully distributed networks of processors (or units) is considered. A system model in which each of the network's units is assumed to possess the ability to test (or evaluate) certain other units for the presence of failures is employed. Using this model and assuming that the total number of faulty units does not exceed a given bound, a distributed algorithm is presented which allows all the fault-free units to independently converge to correct and consistent diagnoses of the system status. This algorithm is also shown to be applicable to bounded fault situations where both units and communication links can be faulty.