Fast online error detection and correction with thread signature calculae

Authors:
Bernhard Fechner
Affiliations:
University of Augsburg, 86159 Augsburg, Germany
Venue:
Microprocessors & Microsystems
Year:
2012

Citing 15
Cited 0

High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback

IEEE Transactions on Computers
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Terrestrial cosmic ray intensities

IBM Journal of Research and Development
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Fault Tolerant Approach to Microprocessor Design

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Study of the Error Behavior of a 32-bit RISC Subjected to Simulated Transient Fault Injection

Proceedings of the IEEE International Test Conference on Discover the New World of Test and Design
Fault tolerance in adaptive real-time computing systems

Fault tolerance in adaptive real-time computing systems
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth

IEEE Micro
Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing)

Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing)
DRAM errors in the wild: a large-scale field study

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
On-line error detection and fast recover techniques for dependable embedded processors

On-line error detection and fast recover techniques for dependable embedded processors
Analysis of checksum-based execution schemes for pipelined processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To recognize transient control-flow and data faults, caused by Single-Event Upsets (SEUs) in a microprocessor pipeline, several mechanisms to check the execution in the retirement have been proposed and discussed over the years. In this paper, we suggest a compression-based and compression-free checksum-scheme, which is able to recognize transient faults before commitment and preserves binary compatibility. The scheme is applicable for time-redundant (virtual duplex and redundantly multithreaded systems) as well as structural redundant systems. It can localize a fault by partial re-execution within the pipeline. By additionally introducing a modified micro-rollback, single or multiple pipeline stages can be rolled back for a retry. In the best case, a fault can be localized, detected and corrected in four clock cycles within a fine-grained redundantly threaded microprocessor. We validate and analyze the scheme through an FPGA and standard-cell implementation and conclude that it is able to replace the well-known parity-computation for high-performance designs.