A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems
ACM Computing Surveys (CSUR)
Fault-Tolerant Computing
An annotated bibliography of dependable distributed computing
ACM SIGOPS Operating Systems Review
Fault-containing self-stabilizing algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Hardware fault containment in scalable shared-memory multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Toward a resourceful method of software fault tolerance
ACM-SE 37 Proceedings of the 37th annual Southeast regional conference (CD-ROM)
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
Fault-containing network protocols
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
A Distributed Fault-Tolerant Design for Multiple-Server VOD Systems
Multimedia Tools and Applications
Playback Dispatch and Fault Recovery for a Clustered Video System with Multiple Servers
Multimedia Tools and Applications
Designing a resourceful fault-tolerance system
Journal of Systems and Software
Dependability and Configurability: Partners or Competitors in Pervasive Computing?
SAFECOMP '02 Proceedings of the 21st International Conference on Computer Safety, Reliability and Security
The Iterative Multi-agent Method for Solving Complex Search Problems
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Fault tolerant computing in computer design
Journal of Computing Sciences in Colleges
Modeling and Testing a Critical Fault-Tolerant Multi-Process System
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Fault-tolerant platforms for automotive safety-critical applications
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Superstabilizing mutual exclusion
Distributed Computing
Terminating Renewal Processes: Analytical-Statistical Estimates and Their Efficiency
Cybernetics and Systems Analysis
SEU tolerant device, circuit and processor design
Proceedings of the 42nd annual Design Automation Conference
A relational database model of program execution and software components
ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
A performance model of highly available multicomputer systems
International Journal of Modelling and Simulation
Implementing fault-tolerance in real-time programs by automatic program transformations
ACM Transactions on Embedded Computing Systems (TECS)
Physical architectures of automotive systems
Proceedings of the conference on Design, automation and test in Europe
Towards Dynamic Component Isolation in a Service Oriented Platform
CBSE '09 Proceedings of the 12th International Symposium on Component-Based Software Engineering
Systematic hardening of distributed component applications to improve their QoS
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
A new algorithm for increasing fault-tolerance of distributed systems
CSN '07 Proceedings of the Sixth IASTED International Conference on Communication Systems and Networks
FT-OSGi: Fault Tolerant Extensions to the OSGi Service Platform
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
On the application of the concept of dependability for design and analysis of vision systems
ICVS'03 Proceedings of the 3rd international conference on Computer vision systems
Cost-effective safety and fault localization using distributed temporal redundancy
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Reconfiguring self-stabilizing publish/subscribe systems
DSOM'06 Proceedings of the 17th IFIP/IEEE international conference on Distributed Systems: operations and management
Specification and synthesis of hardware checkpointing and rollback mechanisms
Proceedings of the 49th Annual Design Automation Conference
Preserving Hamming Distance in Arithmetic and Logical Operations
Journal of Electronic Testing: Theory and Applications
Hi-index | 4.10 |
The basic concepts of fault-tolerant computing are reviewed, focusing on hardware. Failures, faults, and errors in digital systems are examined, and measures of dependability, which dictate and evaluate fault-tolerance strategies for different classes of applications, are defined. The elements of fault-tolerance strategies are identified, and various strategies are reviewed. They are: error detection, masking, and correction; error detection and correction codes; self-checking logic; module replication for error detection and masking; protocol and timing checks; fault containment; reconfiguration and repair; and system recovery.