Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Efficient checker processor design
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
A Fault Tolerant Approach to Microprocessor Design
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Recursive TMR: Scaling Fault Tolerance in the Nanoscale Era
IEEE Design & Test
Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Self-checking instructions: reducing instruction redundancy for concurrent error detection
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
Hierarchical Verification for Increasing Performance in Reliable Processors
Journal of Electronic Testing: Theory and Applications
StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Compiler-assisted soft error detection under performance and energy constraints in embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
End-to-end register data-flow continuous self-test
Proceedings of the 36th annual international symposium on Computer architecture
Selective replication: A lightweight technique for soft errors
ACM Transactions on Computer Systems (TOCS)
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Trade-offs in transient fault recovery schemes for redundant multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Resource-Driven optimizations for transient-fault detecting superscalar microarchitectures
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Efficient soft error protection for commodity embedded microprocessors using profile information
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Dynamic transient fault detection and recovery for embedded processor datapaths
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A low-power instruction replay mechanism for design of resilient microprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Previous proposals for soft-error tolerance have called for redundantly executing a program as two concurrent threads on a superscalar microarchitecture. In a balanced superscalar design, the extra workload from redundant execution induces a severe performance penalty due to increased contention for resources throughout the datapath. This paper identifies and analyzes four key factors that affect the performance of redundant execution, namely 1) issue bandwidth and functional unit contention, 2) issue queue and reorder buffer capacity contention, 3) decode and retirement bandwidth contention, and 4) coupling between redundant threads' dynamic resource requirements. Based on this analysis, we propose the SHREC microarchitecture for asymmetric and staggered redundant execution. This microarchitecture addresses the four factors in an integrated design without requiring prohibitive additional hardware resources. In comparison to conventional single-threaded execution on a state-of-the-art superscalar microarchitecture with comparable cost, SHREC reduces the average performance penalty to within 4% on integer and 15% on floating-point SPEC2K benchmarks by sharing resources more efficiently between the redundant threads.