IEEE Transactions on Software Engineering
New Encoding/Decoding Methods for Designing Fault-Tolerant Matrix Operations
IEEE Transactions on Parallel and Distributed Systems
Algorithm-Based Fault Location and Recovery for Matrix Computations on Multiprocessor Systems
IEEE Transactions on Computers
Extending Backward Error Assertions to Tolerance of Large Errors in Floating Point Computations
IEEE Transactions on Computers
IEEE Transactions on Computers
Generalized Algorithm-Based Fault Tolerance: Error Correction via Kalman Estimation
IEEE Transactions on Computers
Diagnosability and Diagnosis of Algorithm-Based Fault-Tolerant Systems
IEEE Transactions on Computers
Error Correcting Codes Over Z/sub 2(m/) for Algorithm-Based Fault Tolerance
IEEE Transactions on Computers
Reliable Floating-Point Arithmetic Algorithms for Error-Coded Operands
IEEE Transactions on Computers
Computational Arrays with Flexible Redundancy
IEEE Transactions on Computers
Synthesis of Algorithm-Based Fault-Tolerant Systems from Dependence Graphs
IEEE Transactions on Parallel and Distributed Systems
Partitioned Encoding Schemes for Algorithm-Based Fault Tolerance in Massively Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Soft error vulnerability of iterative linear algebra methods
Proceedings of the 22nd annual international conference on Supercomputing
Optimal real number codes for fault tolerant matrix operations
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Nonconcurrent error correction in the presence of roundoff noise
IEEE Transactions on Circuits and Systems Part I: Regular Papers
Constructing numerically stable real number codes using evolutionary computation
Proceedings of the 12th annual conference on Genetic and evolutionary computation
High performance linpack benchmark: a fault tolerant implementation without checkpointing
Proceedings of the international conference on Supercomputing
Algorithm-based recovery for iterative methods without checkpointing
Proceedings of the 20th international symposium on High performance distributed computing
Scalable distributed consensus to support MPI fault tolerance
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Fault tolerant preconditioned conjugate gradient for sparse linear system solution
Proceedings of the 26th ACM international conference on Supercomputing
Fault resilience of the algebraic multi-grid solver
Proceedings of the 26th ACM international conference on Supercomputing
Correcting soft errors online in LU factorization
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Self-stabilizing iterative solvers
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Hi-index | 15.01 |
A linear algebraic interpretation is developed for previously proposed algorithm-based fault tolerance schemes. The concepts of distance, code space, and the definitions of detection and correction in the vector space R/sup n/ are explained. The number of errors that can be detected or corrected for a distance-(d+1) code is derived. It is shown why the correction scheme does not work for general weight vectors, and a novel fast-correction algorithm for a weighted distance-5 code is derived.