Implementing Fail-Silent Nodes for Distributed Systems

Authors:
Francisco V. Brasileiro;Paul Devadoss Ezhilchelvan;Santosh K. Shrivastava;Neil A. Speirs;S. Tao
Affiliations:
Univ. Federal da Paraiba, Paraiba, Brazil;Univ. of Newcastle upon Tyne, Newcastle upon Tyne, UK;Univ. of Newcastle upon Tyne, Newcastle upon Tyne, UK;Univ. of Newcastle upon Tyne, Newcastle upon Tyne, UK;Parallax Solutions, Ltd. Coventry, UK
Venue:
IEEE Transactions on Computers
Year:
1996

Citing 10
Cited 8

Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing

Computer
The MAFT Architecture for Distributed Fault Tolerance

IEEE Transactions on Computers - Fault-Tolerant Computing
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Understanding fault-tolerant distributed systems

Communications of the ACM
Principal Features of the VOLTAN Family of Reliable Node Architectures for Distributed Systems

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Byzantine generals in action: implementing fail-stop processors

ACM Transactions on Computer Systems (TOCS)
A method for obtaining digital signatures and public-key cryptosystems

Communications of the ACM
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Fault-tolerant clock synchronization

PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
On the possibility and impossibility of achieving clock synchronization

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing

Simulative performance analysis of gossip failure detection for scalable distributed systems

Cluster Computing
Comparing Fail-Sailence Provided by Process Duplication versus Internal Error Detection for DHCP Server

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Framework for Database Audit and Control Flow Checking for a Wireless Telephone Network Controller

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Fault-Tolerant Deployment of Embedded Software for Cost-Sensitive Real-Time Feedback-Control Applications

Proceedings of the conference on Design, automation and test in Europe - Volume 2
A formal approach to fault tree synthesis for the analysis of distributed fault tolerant systems

Proceedings of the 5th ACM international conference on Embedded software
Maximizing the Robustness of TDMA Networks with Applications to TTP/C

Real-Time Systems
Learning from the past for resolving dilemmas of asynchrony

ACM SIGOPS Operating Systems Review
Architecture and protocol support for providing consensus as a fault-tolerant virtualised service

Proceedings of the 8th International Conference on Frontiers of Information Technology

Quantified Score

Hi-index	14.98

Visualization

Abstract

A fail-silent node is a self-checking node that either functions correctly or stops functioning after an internal failure is detected. Such a node can be constructed from a number of conventional processors. In a software-implemented fail-silent node, the nonfaulty processors of the node need to execute message order and comparison protocols to "keep in step" and check each other, respectively. In this paper, the design and implementation of efficient protocols for a two processor fail-silent node are described in detail. The performance figures obtained indicate that in a wide class of applications requiring a high degree of fault-tolerance, software-implemented fail-silent nodes constructed simply by utilizing standard "off-the-shelf" components are an attractive alternative to their hardware-implemented counterparts that do require special-purpose hardware components, such as fault-tolerant clocks, comparator, and bus interface circuits.