Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Closure and Convergence: A Foundation of Fault-Tolerant Computing
IEEE Transactions on Software Engineering - Special issue on software reliability
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Maintenance of a spanning tree in dynamic networks
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
Graph relabelling systems and distributed algorithms
Handbook of graph grammars and computing by graph transformation
Easy impossibility proofs for distributed consensus problems
Proceedings of the fourth annual ACM symposium on Principles of distributed computing
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Self-stabilizing systems in spite of distributed control
Communications of the ACM
Dependability: Basic Concepts and Terminology
Dependability: Basic Concepts and Terminology
SIAM Journal on Computing
Tolerating Transient and Permanent Failures (Extended Abstract)
WDAG '93 Proceedings of the 7th International Workshop on Distributed Algorithms
Automating the Addition of Fault-Tolerance
FTRTFT '00 Proceedings of the 6th International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems
Synthesis of fault-tolerant concurrent programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Automatic Approach to Self-Stabilization
SNPD-SAWN '05 Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks
Hi-index | 0.00 |
We present a formal method based on graph rewriting systems for the specifications and the proofs of fault-tolerant distributed algorithms. Our method deals with crash failures. In a crash failure system the process can fail by crashing, i.e. by permanently halting. The faulty processes are the processes contaminated by the crashes. The methodology is formalized in two phases. In the first phase, we build the set of illegitimate configurations to specify the faults and the faulty processes. The second phase is devoted to the addition of correction rules in the initial graph rewriting system used to encode the distributed algorithm. These rules are able to detect and eliminate the faults locally during the computation. This method can be implemented under an asynchronous message passing system which notifies the faults. To illustrate this approach, we present examples of fault-tolerant distributed spanning tree algorithms.