Implementation of Online Distributed System-Level Diagnosis Theory

  • Authors:
  • Ronald P. Bianchini, Jr.;Richard W. Buskens

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Computers - Special issue on fault-tolerant computing
  • Year:
  • 1992

Quantified Score

Hi-index 0.01

Visualization

Abstract

The practical application and implementation of online distributed system-level diagnosis theory is documented. Proven distributed diagnosis algorithms are shown to be impractical in real systems due to high resource requirements. A distributed system-level diagnosis algorithm called Adaptive DSD is shown to minimize network resources and has resulted in a practical implementation. Adaptive DSD assumes a distributed network, in which network nodes can test other nodes and determine them to be faulty or fault-free. Tests are issued from each node adaptively and depend on the fault situation of the network. Test result reports are generated from test results and forwarded between nodes in the network. Adaptive DSD is proven correct in that each fault-free node reaches an accurate independent diagnosis of the fault conditions of the remaining nodes. No restriction is placed on the number of faulty nodes; any fault situation with any number of faulty nodes is diagnosed correctly. An implementation of the Adaptive DSD algorithm is described.