Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Soft Errors in Advanced Computer Systems
IEEE Design & Test
A Technique for Modular Design of Self-Checking Carry-Select Adder
DFT '05 Proceedings of the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Lazy Error Detection for Microprocessor Functional Units
DFT '07 Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems
Concurrent Error Detection in ALU's by Recomputing with Shifted Operands
IEEE Transactions on Computers
Using Underutilized CPU Resources to Enhance Its Reliability
IEEE Transactions on Dependable and Secure Computing
An Online Mechanism to Verify Datapath Execution Using Existing Resources in Chip Multiprocessors
ATS '11 Proceedings of the 2011 Asian Test Symposium
Hi-index | 0.00 |
There is a growing concern about the increasing rate of defects in computing substrates. Traditional redundancy solutions prove to be too expensive for commodity microprocessor systems. Modern microprocessors feature multiple execution units to take advantage of instruction level parallelism. However, most workloads do not exhibit the level of instruction level parallelism that a typical microprocessor is resourced for. This offers an opportunity to reexecute instructions using idle execution units. But, relying solely on idle resources will not provide full instruction coverage and there is a need to explore other alternatives. To that end, we propose and evaluate two instruction replay schemes within the same core for online testing of the execution units. One scheme (RER) reexecutes only the retired instructions, while the other (REI) reexecutes all the issued instructions. The complete proposed solution requires a comparator and minor modifications to control logic, resulting in negligible hardware overhead. Both soft and hard error detection are considered and the performance and energy impact of both schemes are evaluated and compared against previously proposed redundant execution schemes. Results show that even though the proposed schemes result in a small performance penalty when compared to previous work, the energy overhead is significantly reduced.