Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems
IEEE Transactions on Computers - The MIT Press scientific computation series
VLSI array processors
An analysis of algorithm-based fault tolerance techniques
Journal of Parallel and Distributed Computing
Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor
IEEE Transactions on Computers
Algorithm-based fault tolerance for matrix inversion with maximum pivoting
Journal of Parallel and Distributed Computing
An introduction to systolic algorithm design
An introduction to systolic algorithm design
Determining performance measures of algorithm-based fault tolerant systems
Journal of Parallel and Distributed Computing
Mantissa-Preserving Operations and Robust Algorithm-Based Fault Tolerance for Matrix Computations
IEEE Transactions on Computers
Robust checksum test in algorithm-based fault tolerance on 2-D processor arrays
Robust checksum test in algorithm-based fault tolerance on 2-D processor arrays
Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Introduction to Mathematical Theory of Computation
Introduction to Mathematical Theory of Computation
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Optimal Design of Checks for Error Detection and Location in Fault-Tolerant Multiprocessor Systems
IEEE Transactions on Computers
Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems
IEEE Transactions on Computers
Construction of Check Sets for Algorithm-Based Fault Tolerance
IEEE Transactions on Computers
Algorithm-Based Fault Tolerance for FFT Networks
IEEE Transactions on Computers
Synthesis of Algorithm-Based Fault-Tolerant Systems from Dependence Graphs
IEEE Transactions on Parallel and Distributed Systems
Almost Certain Fault Diagnosis Through Algorithm-Based Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
Partitioned Encoding Schemes for Algorithm-Based Fault Tolerance in Massively Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Complete Tests in Algorithm-Based Fault-Tolerant Matrix Operations on Processor Arrays
Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Algorithm-Based Diskless Checkpointing for Fault-Tolerant Matrix Operations
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Analysis and design of algorithm-based fault-tolerant systems
Analysis and design of algorithm-based fault-tolerant systems
Hi-index | 0.00 |
Algorithm-Based Fault Tolerance (ABFT) is a well known technique for achieving fault and error detection in multiprocessor systems. We examine several issues concerning ABFT systems when the data flow information for the underlying multiprocessor computation is available. Our results show that this finergrained information can be exploited to obtain test schemes involving fewer checks, in some cases, dramatically fewer checks. We address both the analysis and design of ABFT systems when the data flow information is available. The analysis problem for a given ABFT system is to determine the fault detectability and the fault locatability (maximum number of detectable and locatable faulty processors) of the system. We show that the analysis problem can be solved efficiently when the number of faults is fixed. We also address the computational difficulty of this problem when the number of faults is not fixed. The design problem is concerned with the construction of a minimal collection of checks which can detect or locate a specified number of faults for a given multiprocessor computation. We examine some special classes of data flow graphs and establish upper and lower bounds on the number of checks needed to detect or locate a given number of faults. We also address the computational difficulty of this design problem for several cases.