Design and Analysis of an Optimal Instruction-Retry Policy for TMR Controller Computers

  • Authors:
  • Hagbae Kim;Kang G. Shin

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1996

Quantified Score

Hi-index 14.98

Visualization

Abstract

An instruction-retry policy is proposed to enhance the fault-tolerance of triple modular redundant (TMR) controller computers by adding time redundancy to them. A TMR failure is said to occur if a TMR system fails to establish a majority among its modules' outputs due to multiple faulty modules or a faulty voter. Either multiple consecutive TMR failures the active period of which exceeds a certain time limit or the exhaustion of spares as a result of frequent system reconfigurations may result in failure to meet the timing constraints of one or more tasks, called the dynamic failure, during a given mission. An optimal instruction-retry period is derived by minimizing the probability of dynamic failure upon detection of either a masked (by the TMR) error or a TMR failure. We also derive the minimum number of spares needed to keep below the pre-specified level the probability of dynamic failure for a given mission by using the derived optimal retry period.