Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Understanding fault-tolerant distributed systems
Communications of the ACM
Replication management using the state-machine approach
Distributed systems (2nd Ed.)
Distributed systems (2nd Ed.)
PicoDMBS: Scaling Down Database Techniques for the Smartcard
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Database Replication Techniques: A Three Parameter Classification
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Hi-index | 0.00 |
The fault tolerance in distributed systems is strongly related to the replication notion, which can be either passive or active. To ensure the recovery of an application when a fault occurs, fault analysis and prevention techniques are used, so the faulty service is replaced by another one. Fault tolerance is ensuring the correct behaviour of the application. We suggest in this paper to design a fault tolerant framework along the invocation of a service. This framework is based on replacing the faulty service by another equivalent service, or a composition of services. The approach consists on collecting, on a particular node called "supervisor", all the features required to protect the execution of a distributed application when a fault happens in one of its running services. The faults that we try to tolerate in our platform are mainly hardware faults, link breakdowns and faults due to the random mobility of wireless ad hoc network nodes.