A formal model for fault-tolerance in distributed systems

Authors:
Brahim Hamid;Mohamed Mosbah
Affiliations:
LaBRI, ENSEIRB, University of Bordeaux-1, Talence, France;LaBRI, ENSEIRB, University of Bordeaux-1, Talence, France
Venue:
SAFECOMP'05 Proceedings of the 24th international conference on Computer Safety, Reliability, and Security
Year:
2005

Citing 16
Cited 0

Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Closure and Convergence: A Foundation of Fault-Tolerant Computing

IEEE Transactions on Software Engineering - Special issue on software reliability
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
Maintenance of a spanning tree in dynamic networks

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
Graph relabelling systems and distributed algorithms

Handbook of graph grammars and computing by graph transformation
Easy impossibility proofs for distributed consensus problems

Proceedings of the fourth annual ACM symposium on Principles of distributed computing
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Self-stabilizing systems in spite of distributed control

Communications of the ACM
Dependability: Basic Concepts and Terminology

Dependability: Basic Concepts and Terminology
Tight Fault Locality

SIAM Journal on Computing
Tolerating Transient and Permanent Failures (Extended Abstract)

WDAG '93 Proceedings of the 7th International Workshop on Distributed Algorithms
Automating the Addition of Fault-Tolerance

FTRTFT '00 Proceedings of the 6th International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems
Synthesis of fault-tolerant concurrent programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
An Automatic Approach to Self-Stabilization

SNPD-SAWN '05 Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a formal method based on graph rewriting systems for the specifications and the proofs of fault-tolerant distributed algorithms. Our method deals with crash failures. In a crash failure system the process can fail by crashing, i.e. by permanently halting. The faulty processes are the processes contaminated by the crashes. The methodology is formalized in two phases. In the first phase, we build the set of illegitimate configurations to specify the faults and the faulty processes. The second phase is devoted to the addition of correction rules in the initial graph rewriting system used to encode the distributed algorithm. These rules are able to detect and eliminate the faults locally during the computation. This method can be implemented under an asynchronous message passing system which notifies the faults. To illustrate this approach, we present examples of fault-tolerant distributed spanning tree algorithms.