Modeling the cosmic-ray-induced soft-error rate in integrated circuits: an overview
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Area efficient architectures for information integrity in cache memories
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fault-Containment in Cache Memories for TMR Redundant Processor Systems
IEEE Transactions on Computers
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
Power4 System Design for High Reliability
IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Software-Implemented Fault Detection for High-Performance Space Applications
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Fault Tolerance through Re-Execution in Multiscalar Architecture
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Soft error and energy consumption interactions: a data cache perspective
Proceedings of the 2004 international symposium on Low power electronics and design
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Understanding Scheduling Replay Schemes
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Compiler-guided register reliability improvement against soft errors
Proceedings of the 5th ACM international conference on Embedded software
Selective replication: A lightweight technique for soft errors
ACM Transactions on Computer Systems (TOCS)
Proceedings of the Conference on Design, Automation and Test in Europe
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
COOL: control-based optimization of load-balancing for thermal behavior
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Application-specific power-efficient approach for reducing register file vulnerability
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Transient errors are one of the major reasons for system downtime in many systems. While prior research has mainly focused on the impact of transient errors on datapath, caches and main memories, the register file has largely been neglected. Since the register file is accessed very frequently, the probability of transient errors is high. In addition, errors in it can quickly spread to different parts of the system, and cause application crash or silent data corruption. This paper addresses the reliability of register files in superscalar processors. Particularly, we propose to duplicate actively used physical registers in unused physical registers. The rationale behind this idea is that if the protection mechanism (parity or ECC) used for the primary copy indicates an error, the duplicate can provide the data as long as it is not corrupted. We implement two types of strategies based on this register duplication idea. In the "conservative strategy," we limit ourselves with the given register usage behavior, and duplicate register contents only on otherwise unused registers. Consequently, there is no impact on the original performance when there is no error, except for the protection mechanism used for the primary copy. Our experiments with two different versions of this strategy show that, with the more powerful conservative scheme, 78% of the accesses are to the physical registers with duplicates .The "aggressive strategy" sacrifices some performance to increase the number of register accesses with duplicates. It does so by marking the registers not used for a long time as "dead" and using them for duplicating actively used registers. The experiments with this strategy indicate that it takes the fraction of the reliable register accesses to 84%, and degrades the overall performance by only 0.21% on the average.