Tolerating transient and intermittent failures

  • Authors:
  • Sylvie Delaët;Sébastien Tixeuil

  • Affiliations:
  • Laboratoire de Recherche en Informatique, UMR CNRS 8623, Université de Paris Sud, 91405 Orsay Cedex, France;Laboratoire de Recherche en Informatique, UMR CNRS 8623, Université de Paris Sud, 91405 Orsay Cedex, France

  • Venue:
  • Journal of Parallel and Distributed Computing - Self-stabilizing distributed systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault tolerance is a crucial property for recent distributed systems. We propose an algorithm that solves the census problem (list all processor identifiers and their relative distance) on an arbitrary strongly connected network.This algorithm tolerates transient faults that corrupt the processors and communication links memory (it is self-stabilizing) as well as intermittent faults (fair loss, reorder, finite duplication of messages) on communication media. A formal proof establishes its correctness for the considered problem. Our algorithm leads to the construction of algorithms for any silent problems that are self-stabilizing while supporting the same communication hazards.