Instruction-Level Fault Tolerance Configurability

Authors:
Demid Borodin;B. H. Juurlink;Said Hamdioui;Stamatis Vassiliadis
Affiliations:
Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, Delft, The Netherlands 2628 CD;Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, Delft, The Netherlands 2628 CD;Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, Delft, The Netherlands 2628 CD;Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, Delft, The Netherlands 2628 CD
Venue:
Journal of Signal Processing Systems
Year:
2009

Citing 26
Cited 1

Error-control coding for computer systems

Error-control coding for computer systems
Design & analysis of fault tolerant digital systems

Design & analysis of fault tolerant digital systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
Efficient checker processor design

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Transient-fault recovery using simultaneous multithreading

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Concurrent Error Detection Using Watchdog Processors-A Survey

IEEE Transactions on Computers
A Fault Tolerant Approach to Microprocessor Design

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A study of time redundant fault tolerance techniques for superscalar processors

DFT '95 Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Defect and Error Tolerance in the Presence of Massive Numbers of Defects

IEEE Design & Test
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
Analysis and Testing for Error Tolerant Motion Estimation

DFT '05 Proceedings of the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
Watchdog Processors and Structural Integrity Checking

IEEE Transactions on Computers
Concurrent Error Detection in ALU's by Recomputing with Shifted Operands

IEEE Transactions on Computers
Fault Detection Capabilities of Alternating Logic

IEEE Transactions on Computers

Protective redundancy overhead reduction using instruction vulnerability factor

Proceedings of the 7th ACM international conference on Computing frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to modern technology trends such as decreasing feature sizes and lower voltage levels, fault tolerance (FT) is becoming increasingly important in computing systems. Several schemes have been proposed to enable a user to configure the FT at the application level, thereby enabling the user to trade stronger FT for performance or vice versa. In this paper, we propose supporting instruction-level rather than application-level configurability of FT, since different parts of some applications (e.g., multimedia) can have different reliability requirements. Weak or no FT will be applied to less critical parts, resulting in time and/or resource gains. These gains can be used to apply stronger FT techniques to the more critical parts; hence increasing the overall reliability. The paper shows how some existing FT techniques can be adapted to support instruction-level FT configurability, how a programmer can specify the desired FT level of the instructions, and how the compiler can manage it automatically. A comparison between the existing FT scheme EDDI (which duplicates all instructions) and the proposed approach is performed both at the kernel and at full application levels. The simulation results show that both the performance and the energy consumption are significantly improved (up to 50% at the kernel and up to 16% at full application level), while the fault coverage depends on the application. For the full application (JPEG encoder), our approach is only applied to one kernel in order to avoid increasing the programming effort significantly.