High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback
IEEE Transactions on Computers
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Terrestrial cosmic ray intensities
IBM Journal of Research and Development
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Fault Tolerant Approach to Microprocessor Design
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Study of the Error Behavior of a 32-bit RISC Subjected to Simulated Transient Fault Injection
Proceedings of the IEEE International Test Conference on Discover the New World of Test and Design
Fault tolerance in adaptive real-time computing systems
Fault tolerance in adaptive real-time computing systems
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing)
Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing)
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
On-line error detection and fast recover techniques for dependable embedded processors
On-line error detection and fast recover techniques for dependable embedded processors
Analysis of checksum-based execution schemes for pipelined processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
To recognize transient control-flow and data faults, caused by Single-Event Upsets (SEUs) in a microprocessor pipeline, several mechanisms to check the execution in the retirement have been proposed and discussed over the years. In this paper, we suggest a compression-based and compression-free checksum-scheme, which is able to recognize transient faults before commitment and preserves binary compatibility. The scheme is applicable for time-redundant (virtual duplex and redundantly multithreaded systems) as well as structural redundant systems. It can localize a fault by partial re-execution within the pipeline. By additionally introducing a modified micro-rollback, single or multiple pipeline stages can be rolled back for a retry. In the best case, a fault can be localized, detected and corrected in four clock cycles within a fine-grained redundantly threaded microprocessor. We validate and analyze the scheme through an FPGA and standard-cell implementation and conclude that it is able to replace the well-known parity-computation for high-performance designs.