Design and Analysis of an Optimal Instruction-Retry Policy for TMR Controller Computers

Authors:
Hagbae Kim;Kang G. Shin
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1996

Citing 6
Cited 3

Analysis of a Class of Recovery Procedures

IEEE Transactions on Computers
On switching policies for modular redundancy fault-tolerant computing systems

IEEE Transactions on Computers
Optimal design and use of retry in fault-tolerant computer systems

Journal of the ACM (JACM)
The Design of Totally Self-Checking TMR Fault-Tolerant Systems

IEEE Transactions on Computers
An Optimal Retry Policy Based on Fault Classification

IEEE Transactions on Computers
A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods

IEEE Transactions on Computers

Sequencing Tasks to Minimize the Effects of Near-Coincident Faults in TMR Controller Computers

IEEE Transactions on Computers
Probabilistic Schedulability Analysis of Harmonic Multi-Task Systems with Dual-Modular Temporal Redundancy

Real-Time Systems
Evaluating the effectiveness of a mixed-signal TMR scheme based on design diversity

SBCCI '10 Proceedings of the 23rd symposium on Integrated circuits and system design

Quantified Score

Hi-index	14.98

Visualization

Abstract

An instruction-retry policy is proposed to enhance the fault-tolerance of triple modular redundant (TMR) controller computers by adding time redundancy to them. A TMR failure is said to occur if a TMR system fails to establish a majority among its modules' outputs due to multiple faulty modules or a faulty voter. Either multiple consecutive TMR failures the active period of which exceeds a certain time limit or the exhaustion of spares as a result of frequent system reconfigurations may result in failure to meet the timing constraints of one or more tasks, called the dynamic failure, during a given mission. An optimal instruction-retry period is derived by minimizing the probability of dynamic failure upon detection of either a masked (by the TMR) error or a TMR failure. We also derive the minimum number of spares needed to keep below the pre-specified level the probability of dynamic failure for a given mission by using the derived optimal retry period.