The MAFT Architecture for Distributed Fault Tolerance
IEEE Transactions on Computers - Fault-Tolerant Computing
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Understanding fault-tolerant distributed systems
Communications of the ACM
Principal Features of the VOLTAN Family of Reliable Node Architectures for Distributed Systems
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Byzantine generals in action: implementing fail-stop processors
ACM Transactions on Computer Systems (TOCS)
A method for obtaining digital signatures and public-key cryptosystems
Communications of the ACM
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Fault-tolerant clock synchronization
PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
On the possibility and impossibility of achieving clock synchronization
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Framework for Database Audit and Control Flow Checking for a Wireless Telephone Network Controller
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Proceedings of the conference on Design, automation and test in Europe - Volume 2
A formal approach to fault tree synthesis for the analysis of distributed fault tolerant systems
Proceedings of the 5th ACM international conference on Embedded software
Maximizing the Robustness of TDMA Networks with Applications to TTP/C
Real-Time Systems
Learning from the past for resolving dilemmas of asynchrony
ACM SIGOPS Operating Systems Review
Architecture and protocol support for providing consensus as a fault-tolerant virtualised service
Proceedings of the 8th International Conference on Frontiers of Information Technology
Hi-index | 14.98 |
A fail-silent node is a self-checking node that either functions correctly or stops functioning after an internal failure is detected. Such a node can be constructed from a number of conventional processors. In a software-implemented fail-silent node, the nonfaulty processors of the node need to execute message order and comparison protocols to "keep in step" and check each other, respectively. In this paper, the design and implementation of efficient protocols for a two processor fail-silent node are described in detail. The performance figures obtained indicate that in a wide class of applications requiring a high degree of fault-tolerance, software-implemented fail-silent nodes constructed simply by utilizing standard "off-the-shelf" components are an attractive alternative to their hardware-implemented counterparts that do require special-purpose hardware components, such as fault-tolerant clocks, comparator, and bus interface circuits.