Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Soft Errors in Advanced Computer Systems
IEEE Design & Test
Using Register Lifetime Predictions to Protect Register Files against Soft Errors
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
On the latency, energy and area of checkpointed, superscalar register alias tables
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
A physical level study and optimization of CAM-based checkpointed register alias table
Proceedings of the 13th international symposium on Low power electronics and design
An energy-efficient checkpointing mechanism for out of order commit processor
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
The use of triple-modular redundancy to improve computer reliability
IBM Journal of Research and Development
On the latency and energy of checkpointed superscalar register alias tables
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ACM SIGARCH Computer Architecture News
Hi-index | 0.00 |
Register renaming is a widely used technique to remove false dependencies in contemporary superscalar microprocessors. A register alias table (RAT) is formed to hold current locations of the values that correspond to the architectural registers. Some recently designed processors take a copy of the rename table at each branch instruction, in order to recover its contents when a misspeculation occurs. In this paper first we investigate the RAT vulnerability against transient errors. Then we analyze the vulnerability of RAT checkpoints and propose two techniques for soft error detection and correction utilizing redundantly taken copies of the entries whose content is the same with the previous and/or next checkpoints. Simulation results of the spec 2006 benchmarks reveal that on the average RAT vulnerability is 25% and checkpoint vulnerability is 6%. Results also reveal that redundancy exists at sequential checkpoint copies and can be used for error detection and correction purposes. We propose techniques that exploit this redundancy and show that faults in 41% of all checkpoints and 44% of rolled-back checkpoints can be detected and errors in 33% of the rolled-back checkpoints can be corrected. Since we exploit the already available storage, proposed error detection and correction techniques can be implemented with minimal hardware overhead.