Fault Tolerance through Re-Execution in Multiscalar Architecture

Authors:
Faisal Rashid;Kewal K. Saluja;Parameswaran Ramanathan
Affiliations:
-;-;-
Venue:
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Year:
2000

Citing 0
Cited 11

On-line fault detection in a hardware/software co-design environment: system partitioning

Proceedings of the 14th international symposium on Systems synthesis
Reliability Properties Assessment at System Level: A Co-Design Framework

Journal of Electronic Testing: Theory and Applications
A Watchdog Processor Architecture with Minimal Performance Overhead

SAFECOMP '02 Proceedings of the 21st International Conference on Computer Safety, Reliability and Security
REESE: A Method of Soft Error Detection in Microprocessors

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
A Method to Enhance the Fault Coverage Obtained by Output Response Comparison of Identical Circuits

ITC '01 Proceedings of the 2001 IEEE International Test Conference
A Case for Clumsy Packet Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Increasing Register File Immunity to Transient Errors

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Concurrent Online Testing of Identical Circuits Using Nonidentical Input Vectors

IEEE Transactions on Dependable and Secure Computing
Reliable data path design of VLIW processor cores with comprehensive error-coverage assessment

Microprocessors & Microsystems
Time-Constraint-Aware Optimization of Assertions in Embedded Software

Journal of Electronic Testing: Theory and Applications
Configurable fault-tolerance for a configurable VLIW processor

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-threading and multiscaling are two fundamental microarchitecture approaches that are expected to stay on the existing performance gain curve. Both of these approaches assume that integrated circuits with over billion transistors will become available in the near future. Such large integrated circuits imply reduced design tolerances and hence increased failure probability. Conventional hardware redundancy techniques for desired reliability in computation may severely limit the performance of such high performance processors. Hence, we need to study novel methods to exploit the inherent redundancy of the microarchitectures, without unduly affecting the performance, to provide correct program execution and/or detect failures (permanent or transient) that can occur in the hardware.This paper proposes a time redundancy technique suitable for multiscalar architectures. In the multiscalar architecture, there are usually several processing units to exploit the instruction level parallelism that exists in a given program. The technique in this paper uses a majority of the processing units for executing the program as in the traditional multiscalar paradigm while using the remainder of the processing units for re-executing the committed instructions. By comparing the results from the two program executions, errors caused by permanent or transient faults in the processing units can be detected. Simulation results presented in this paper demonstrate that this can be achieved with about 5-15% performance degradation.