IEEE Transactions on Computers
A Fault-Tolerant FFT Processor
IEEE Transactions on Computers
The algebraic eigenvalue problem
The algebraic eigenvalue problem
IEEE Transactions on Software Engineering
Algorithm-Based Fault Detection for Signal Processing Applications
IEEE Transactions on Computers
Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor
IEEE Transactions on Computers
Algorithm-Based Error-Detection Schemes for Iterative Solution of Partial Differential Equations
IEEE Transactions on Computers
An Algorithm Based Error Detection Scheme for the Multigrid Algorithm
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Three-dimensional multiprocessor system-on-chip thermal optimization
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Reliable multiprocessor system-on-chip synthesis
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Computer aided analysis and design of power transformers
Computers in Industry
Fault resilience of the algebraic multi-grid solver
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 14.98 |
Algorithm-based Fault Tolerance (ABFT) is a technique to provide system level error detection and correction on array processors as well as multiprocessors at a low cost. Since the early 80s the technique has been extensively applied to several linear algebraic algorithms, e.g., matrix multiplication, Gaussian elimination, QR factorization, and singular value decompositions, etc. An important class of problems in numerical linear algebra dealing with the iterative solution of linear algebraic equations arising due to the finite difference discretization or the finite element discretization of a partial differential equation, however, has been overlooked. The only exception is the recent application of algorithm based error detection (ABED) encodings to the successive overrelaxation algorithm for Laplace's equation. In this paper, ABED is applied to a multigrid algorithm for the iterative solution of a Poisson equation in two dimensions. Invariants are created to implement checking in the relaxation, the restriction, and the interpolation operators. Modifications to invariants due to roundoff errors accumulated within the operators, which often lead to a situation known as false alarms, have been addressed by deriving the expressions for the roundoff errors in the algebraic processes in the operators and correcting the invariants accordingly. ABED encoded multigrid algorithm is shown to be insensitive to the size and the range of the input data besides providing excellent error coverage at a low latency for floating-point, integer, and memory errors.