Exploiting virtual synchrony in distributed systems
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data
IEEE Transactions on Computers
On Self-Diagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach
IEEE Transactions on Computers
The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Formally Verified On-Line Diagnosis
IEEE Transactions on Software Engineering
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
IEEE Transactions on Computers
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
The Timed Asynchronous Distributed System Model
IEEE Transactions on Parallel and Distributed Systems
Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults
IEEE Transactions on Computers
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
IEEE Transactions on Computers
The Möbius Framework and Its Implementation
IEEE Transactions on Software Engineering
The Formal Verification of an Algorithm for Interactive Consistency under a Hybrid Fault Model
CAV '93 Proceedings of the 5th International Conference on Computer Aided Verification
How to Model Link Failures: A Perception-Based Fault Model
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Impact of Deep Submicron Technology on Dependability of VLSI Circuits
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Discriminating Fault Rate and Persistency to Improve Fault Treatment
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
A comparison connection assignment for diagnosis of multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Design Time Reliability Analysis of Distributed Fault Tolerance Algorithms
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
A Tunable Add-On Diagnostic Protocol for Time-Triggered Systems
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Diagnosis Without Repair for Hybrid Fault Situations
IEEE Transactions on Computers
The CRUTIAL Architecture for Critical Information Infrastructures
Architecting Dependable Systems V
Architecting and validating dependable systems: experiences and visions
Architecting dependable systems VII
A Recovery-Oriented Approach for Software Fault Diagnosis in Complex Critical Systems
International Journal of Adaptive, Resilient and Autonomic Systems
Hi-index | 0.00 |
Afault-tolerant system is designed to provide sustained delivery of services despite encountered perturbations. The ability to accurately detect, diagnose and recover from faults in an on-line manner (i.e., during system operation) constitutes an important aspect of fault-tolerance. This FDIR process has two primary objectives: to consistently identify a faulty node so as to restrictits effect on system operations, and to support the process of system recovery via isolation and reconfiguration of the system resources to sustain ongoing system operations. If FDIR isperformed as an on-line procedure this provides an effective capability of resource management, responding promptly to the appearance and disappearance of faults with a small duration of system susceptibility to subsequent fault accumulation.