Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems
IEEE Transactions on Computers - The MIT Press scientific computation series
Solving problems on concurrent processors
Solving problems on concurrent processors
A Linear Algebraic Model of Algorithm-Based Fault Tolerance
IEEE Transactions on Computers
A novel approach to system-level fault tolerance in hypercube multiprocessors
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Compiler-Assisted Synthesis of Algorithm-Based Checking in Multiprocessors
IEEE Transactions on Computers
Algorithm-Based Fault Detection for Signal Processing Applications
IEEE Transactions on Computers
On Stable Parallel Linear System Solvers
Journal of the ACM (JACM)
Fault-secure algorithms for multiple-processor systems
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Compiler-Assisted Synthesis of Algorithm-Based Checking in Multiprocessors
IEEE Transactions on Computers
Algorithm-Based Error-Detection Schemes for Iterative Solution of Partial Differential Equations
IEEE Transactions on Computers
IEEE Transactions on Computers
An Efficient Algorithm-Based Fault Tolerance Design Using the Weighted Data-Check Relationship
IEEE Transactions on Computers
Nest: A Nested-Predicate Scheme for Fault Tolerance
IEEE Transactions on Computers
An Algorithm-Based Error Detection Scheme for the Multigrid Method
IEEE Transactions on Computers
Hi-index | 0.02 |
The authors provide an in-depth study of the various issues and tradeoffs available in algorithm-based error detection, as well as a general methodology for evaluating the schemes. They illustrate the approach on an extremely useful computation in the field of numerical linear algebra: QR factorization. They have implemented and investigated numerous ways of applying algorithm-based error detection using different system-level encoding strategies for QR factorization. Specifically, schemes based on the checksum and sum-of-squares (SOS) encoding techniques have been developed. The results of studies performed on a 16-processor Intel iPSC-2/D4/MX hypercube multiprocessor are reported. It is shown that, in general, the SOS approach gives much better coverage (85-100%) for QR factorization while maintaining low overheads (below 10%).