Design of Algorithm-Based Fault-Tolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis

  • Authors:
  • V. Vinnakota;N. K. Jha

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.01

Visualization

Abstract

Algorithm-based fault tolerance (ABPT) is a low-overhead system-level concurrent errordetection and fault location scheme for multiprocessor systems. We present new methodsfor the design of ABFT systems. Our design procedure is applicable to a wide range ofsystems in which processors share data elements. A feature of our design approach isthat the type of checks to be used in the final system can be controlled by the systemdesigner. We also present some new bounds on the number of checks needed in ABFTsystem design.