Software-Implemented Fault Detection for High-Performance Space Applications

Authors:
Michael Turmon;Robert Granat;Daniel S. Katz
Affiliations:
-;-;-
Venue:
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Year:
2000

Citing 0
Cited 4

Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
A Case for Clumsy Packet Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Increasing Register File Immunity to Transient Errors

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems

International Journal of Grid and High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe and test a software approach to overcoming radiation-induced errors in spaceborne applications running on commercial off-the-shelf components. The approach uses checksum methods to validate results returned by a numerical subroutine operating subject to unpredictable errors in data. We can treat subroutines that return results satisfying a necessary condition having a linear form; the checksum tests compliance with this condition. We discuss the theory and practice of setting numerical tolerances to separate errors caused by a fault from those inherent in finite-precision numerical calculations. We test both the general effectiveness of the linear fault tolerant schemes we propose, and the correct behavior of our parallel implementation of them.