IBM experiments in soft fails in computer electronics (1978–1994)
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
IBM's S/390 G5 Microprocessor Design
IEEE Micro
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 31st annual international symposium on Computer architecture
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
WCAE '05 Proceedings of the 2005 workshop on Computer architecture education: held in conjunction with the 32nd International Symposium on Computer Architecture
Improving chip multiprocessor reliability through code replication
Computers and Electrical Engineering
Energy-efficient redundant execution for chip multiprocessors
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Reliability-aware dynamic energy management in dependable embedded real-time systems
ACM Transactions on Embedded Computing Systems (TECS)
A survey of checker architectures
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
As device dimensions continue to be aggressively scaled, microprocessors are becoming increasingly vulnerable to the impact of undesired energy, such as that of a cosmic particle strike, which can cause transient errors. To prevent operational failure due to these errors, system-level techniques such as redundant execution will be increasingly required for fault detection and tolerance in future processors. However, the need for redundancy is directly opposed to the growing need for more power efficient operation. Conventional techniques that use multi-core microarchitectures to provide whole-thread duplication generally incur significant energy overhead which can exacerbate the already severe problem of power consumption and heat dissipation given a certain throughput requirement. In the future, approaches that supply the necessary level of robustness at a given throughput level must also be power-aware. We propose a thread-level redundant execution microarchitecture that significantly reduces the energy overhead of replication without unduly impacting performance. Our approach exploits the fact that with appropriate hardware support, the verification operation can be parallelized and run on a chip multiprocessor with support for frequency scaling together with supply voltage scaling and/or body biasing. To further improve the efficiency of verification, we exploit the information obtained by the leading thread to assist the trailing verification threads. We discuss in detail the required architectural support and show that our approach can be highly energy-efficient: using two checkers, fully replicated execution costs only an average 28% extra energy over non-redundant execution with virtually no performance loss.