Algorithm-Based Fault Detection for Signal Processing Applications
IEEE Transactions on Computers
Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor
IEEE Transactions on Computers
Fundamentals of matrix computations
Fundamentals of matrix computations
A mathematical theory of self-checking, self-testing and self-correcting programs
A mathematical theory of self-checking, self-testing and self-correcting programs
LAPACK's user's guide
Concurrent scientific computing
Concurrent scientific computing
Designing programs that check their work
Journal of the ACM (JACM)
Reflections on the Pentium Division Bug
IEEE Transactions on Computers
Applied numerical linear algebra
Applied numerical linear algebra
Software reliability via run-time result-checking
Journal of the ACM (JACM)
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
IEEE Transactions on Software Engineering
On Stratified Sampling for High Coverage Estimations
EDCC-2 Proceedings of the Second European Dependable Computing Conference on Dependable Computing
Experimental assessment of parallel systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Experimental evaluation of the fail-silent behaviour in programs with consistency checks
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Practical Issues in the Use of ABFT and a New Failure Model
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Algorithm Based Fault Tolerance versus Result-Checking for Matrix Computations
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
On the robustness of functional equations
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Hi-index | 0.00 |
This paper proposes a new fault detection mechanism for the computation of eigenvalues and eigenvectors, the so called eigenproblem, for which no such scheme existed before, to the best of our knowledge. It consists of a number of assertions that can be executed on the results of the computation to determine their correctness. The proposed scheme follows the Result Checking principle, since it does not depend on the particular numerical algorithm used. It can handle both real and complex matrices, symmetric or not. Many practical issues are handled, like rounding errors and eigenvalue ordering, and a practical implementation was built on top of unmodified routines of the well-known LAPACK library. The proposed scheme is simultaneously very efficient, with less than 2% performance overhead for medium to large matrices, very effective, since it exhibited a fault coverage greater than 99.7% with a confidence level of 99%, when subjected to extensive fault-injection experiments, and very easy to adapt to other libraries of mathematical routines besides LAPACK.