A Distributed System-Level Diagnosis Algorithm for Arbitrary Network Topologies

  • Authors:
  • Sampath Rangarajan;Anton T. Dahbura;Eric A. Ziegler

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Computers - Special issue on fault-tolerant computing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault-free processors perform simple periodic tests on one another; when a fault is detected or a newly-repaired processor joins the network, this new information is disseminated $\mbi{in}$$\mbi{parallel}$ throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault-free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies.Index Terms驴Computer fault diagnosis, computer fault tolerance, computer networks, distributed computing, system-level fault diagnosis, distributed algorithm, fault detection.