A formal model for fault-tolerance in distributed systems

  • Authors:
  • Brahim Hamid;Mohamed Mosbah

  • Affiliations:
  • LaBRI, ENSEIRB, University of Bordeaux-1, Talence, France;LaBRI, ENSEIRB, University of Bordeaux-1, Talence, France

  • Venue:
  • SAFECOMP'05 Proceedings of the 24th international conference on Computer Safety, Reliability, and Security
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a formal method based on graph rewriting systems for the specifications and the proofs of fault-tolerant distributed algorithms. Our method deals with crash failures. In a crash failure system the process can fail by crashing, i.e. by permanently halting. The faulty processes are the processes contaminated by the crashes. The methodology is formalized in two phases. In the first phase, we build the set of illegitimate configurations to specify the faults and the faulty processes. The second phase is devoted to the addition of correction rules in the initial graph rewriting system used to encode the distributed algorithm. These rules are able to detect and eliminate the faults locally during the computation. This method can be implemented under an asynchronous message passing system which notifies the faults. To illustrate this approach, we present examples of fault-tolerant distributed spanning tree algorithms.