Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems
IEEE Transactions on Computers - The MIT Press scientific computation series
IEEE Transactions on Computers
A Fault-Tolerant FFT Processor
IEEE Transactions on Computers
A Fault-Tolerant Systolic Sorter
IEEE Transactions on Computers
A Linear Algebraic Model of Algorithm-Based Fault Tolerance
IEEE Transactions on Computers
A novel approach to system-level fault tolerance in hypercube multiprocessors
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Algorithm-Based Fault Detection for Signal Processing Applications
IEEE Transactions on Computers
Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Using Data Flow Information to Obtain Efficient Check Sets for Algorithm-Based Fault Tolerance
International Journal of Parallel Programming
Efficient Self-Recovering ASIC Design
IEEE Design & Test
Hi-index | 0.00 |
Considers the applicability of algorithm based fault tolerance (ABET) to massively parallel scientific computation. Existing ABET schemes can provide effective fault tolerance at a low cost For computation on matrices of moderate size; however, the methods do not scale well to floating-point operations on large systems. This short note proposes the use of a partitioned linear encoding scheme to provide scalability. Matrix algorithms employing this scheme are presented and compared to current ABET schemes. It is shown that the partitioned scheme provides scalable linear codes with improved numerical properties with only a small increase in hardware and time overhead.