Fault-Detection by Result-Checking for the Eigenproblem
EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Soft error vulnerability of iterative linear algebra methods
Proceedings of the 22nd annual international conference on Supercomputing
Verifying quantitative reliability for programs that execute on unreliable hardware
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
Algorithm Based Fault Tolerance (ABFT) is the collective name of a set of techniques used to determine the correctness of some mathematical calculations. A less well known alternative is called Result Checking (RC) where, contrary to ABFT, results are checked without knowledge of the particular algorithm used to calculate them.In this paper a comparison is made between the two using some practical implementations of matrix computations. The criteria are performance and memory overhead, ease of use and error coverage. For the latter extensive error injection experiments were made. To the best of our knowledge, this is the first time that RC is validated by fault injection.We conclude that Result Checking has the important advantage of being independent of the underlying algorithm. It also has generally less performance overhead than ABFT, the two techniques being essentially equivalent in terms of error coverage.