Analysis of a Class of Recovery Procedures
IEEE Transactions on Computers
On switching policies for modular redundancy fault-tolerant computing systems
IEEE Transactions on Computers
Optimal design and use of retry in fault-tolerant computer systems
Journal of the ACM (JACM)
The Design of Totally Self-Checking TMR Fault-Tolerant Systems
IEEE Transactions on Computers
An Optimal Retry Policy Based on Fault Classification
IEEE Transactions on Computers
A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods
IEEE Transactions on Computers
Sequencing Tasks to Minimize the Effects of Near-Coincident Faults in TMR Controller Computers
IEEE Transactions on Computers
Evaluating the effectiveness of a mixed-signal TMR scheme based on design diversity
SBCCI '10 Proceedings of the 23rd symposium on Integrated circuits and system design
Hi-index | 14.98 |
An instruction-retry policy is proposed to enhance the fault-tolerance of triple modular redundant (TMR) controller computers by adding time redundancy to them. A TMR failure is said to occur if a TMR system fails to establish a majority among its modules' outputs due to multiple faulty modules or a faulty voter. Either multiple consecutive TMR failures the active period of which exceeds a certain time limit or the exhaustion of spares as a result of frequent system reconfigurations may result in failure to meet the timing constraints of one or more tasks, called the dynamic failure, during a given mission. An optimal instruction-retry period is derived by minimizing the probability of dynamic failure upon detection of either a masked (by the TMR) error or a TMR failure. We also derive the minimum number of spares needed to keep below the pre-specified level the probability of dynamic failure for a given mission by using the derived optimal retry period.