IEEE Transactions on Computers
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
IBM experiments in soft fails in computer electronics (1978–1994)
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
IBM's S/390 G5 Microprocessor Design
IEEE Micro
Deep-Submicron Microprocessor Design Issues
IEEE Micro
Design Challenges of Technology Scaling
IEEE Micro
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
A study of time redundant fault tolerance techniques for superscalar processors
DFT '95 Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Self-Checking Circuits versus Realistic Faults in Very Deep Submicron
VTS '00 Proceedings of the 18th IEEE VLSI Test Symposium
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
High-level modeling and FPGA prototyping of microprocessors
FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Enhancing data cache reliability by the addition of a small fully-associative replication cache
Proceedings of the 18th annual international conference on Supercomputing
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Increasing Register File Immunity to Transient Errors
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Improving java virtual machine reliability for memory-constrained embedded systems
Proceedings of the 42nd annual Design Automation Conference
Design and Evaluation of Hybrid Fault-Detection Systems
Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Compiler-guided register reliability improvement against soft errors
Proceedings of the 5th ACM international conference on Embedded software
Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability
IEEE Transactions on Computers
Software-controlled fault tolerance
ACM Transactions on Architecture and Code Optimization (TACO)
An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 43rd annual Design Automation Conference
Self-checking instructions: reducing instruction redundancy for concurrent error detection
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for bounding vulnerabilities of processor structures
Proceedings of the 34th annual international symposium on Computer architecture
Modeling and improving data cache reliability: 1
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Transient fault prediction based on anomalies in processor events
Proceedings of the conference on Design, automation and test in Europe
Hierarchical Verification for Increasing Performance in Reliable Processors
Journal of Electronic Testing: Theory and Applications
Efficient fault tolerance in multi-media applications through selective instruction replication
Proceedings of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies
Anomaly-based fault detection in pervasive computing system
Proceedings of the 5th international conference on Pervasive services
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compiler-assisted soft error detection under performance and energy constraints in embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Sequential element design with built-in soft error resilience
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
End-to-end register data-flow continuous self-test
Proceedings of the 36th annual international symposium on Computer architecture
Multi-execution: multicore caching for data-similar executions
Proceedings of the 36th annual international symposium on Computer architecture
Instruction-Level Fault Tolerance Configurability
Journal of Signal Processing Systems
Selective replication: A lightweight technique for soft errors
ACM Transactions on Computer Systems (TOCS)
Improving chip multiprocessor reliability through code replication
Computers and Electrical Engineering
A cost effective approach for online error detection using invariant relationships
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Modeling soft errors for data caches and alleviating their effects on data reliability
Microprocessors & Microsystems
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
Instruction precomputation with memoization for fault detection
Proceedings of the Conference on Design, Automation and Test in Europe
Detecting errors using multi-cycle invariance information
Proceedings of the Conference on Design, Automation and Test in Europe
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Trade-offs in transient fault recovery schemes for redundant multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Resource-Driven optimizations for transient-fault detecting superscalar microarchitectures
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Proceedings of the 9th conference on Computing Frontiers
iGPU: exception support and speculative execution on GPUs
Proceedings of the 39th Annual International Symposium on Computer Architecture
The Journal of Supercomputing
Dynamic transient fault detection and recovery for embedded processor datapaths
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
A survey of checker architectures
ACM Computing Surveys (CSUR)
A low-power instruction replay mechanism for design of resilient microprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.02 |
Diminutive devices and high clock frequency of future microprocessor generations are causing increased concerns for transient soft failures in hardware, necessitating fault detection and recovery mechanisms even in commodity processors. In this paper, we propose a fault-tolerant extension for modern superscalar out-of-order datapath that can be supported by only modest additional hardware. In the proposed extensions, error-detection is achieved by verifying the redundant results of dynamically replicated threads of executions, while the error-recovery scheme employs the instruction-rewind mechanism to restart at a failed instruction. We study the performance impact of augmenting superscalar microarchitectures with this fault tolerance mechanism. An analytical performance model is used in conjunction with a performance simulator. The simulation results of 11 SPEC95 and SPEC2000 benchmarks show that in the absence of faults, error detection causes a 2% to 45% reduction in throughput, which is in line with other proposed detection schemes. In the presence of transient faults, the fast error recovery scheme contributes very little additional slowdown.