Evaluating Reliability Improvements of Fault Tolerant Array Processors Using Algorithm-Based Fault Tolerance

  • Authors:
  • D. L. Tao;Kamal Kantawala

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1997

Quantified Score

Hi-index 14.98

Visualization

Abstract

Algorithm-based fault tolerance (ABFT) is used to provide low-cost error protection for VLSI processor arrays used in real-time digital signal processing. The main objective of incorporating an ABFT technique in a processor array is to improve its reliability. All previous approaches on ABFT are evaluated in terms of their error detecting/correcting capabilities, the reliability improvement has never been addressed. In this paper, we develop a stochastic model for an array processor incorporating ABFT that takes the behavior of transient/intermittent failures and hardware overhead into account. This model is then used to evaluate reliability and reliability improvements of several existing ABFT techniques that tolerate single faults. Therefore, a user can evaluate a number of ABFT techniques and make a trade-off between reliability and cost prior to the implementation. Moreover, we have conducted extensive simulation experiments and the simulation results validate the proposed model.