Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems

  • Authors:
  • B. Vinnakota;N. K. Jha

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 1993

Quantified Score

Hi-index 14.99

Visualization

Abstract

Parallel processing architectures are commonly used for signal processing and other computationally intensive applications. These applications are characterized by high throughput and long processing periods. Such characteristics decrease the reliability of high-performance architectures. The erroneous data produced by faulty processors could have damaging consequences, particularly in critical real-time applications. It is therefore desirable that any erroneous data produced by the system be detected and located as quickly as possible. Algorithm-based fault tolerance (ABFT) is a low-cost system-level concurrent error detection and fault location scheme. Methods used in the analysis of multiprocessor systems using system-level diagnosis are applied to the analysis of ABFT systems. A new algorithm for analyzing an ABFT system for its fault diagnosability is developed using these methods. Based on this work, a fault diagnosis algorithm is developed for ABFT systems.