ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
IBM's S/390 G5 Microprocessor Design
IEEE Micro
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Exploiting Instruction Redundancy for Transient Fault Tolerance
DFT '03 Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting Structural Duplication for Lifetime Reliability Enhancement
Proceedings of the 32nd annual international symposium on Computer Architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Online diagnosis of hard faults in microprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Implementing high availability memory with a duplication cache
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual international symposium on Computer architecture
Extending SRT for parallel applications in tiled-CMP architectures
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
mSWAT: low-cost hardware fault detection and diagnosis for multicore systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Using Underutilized CPU Resources to Enhance Its Reliability
IEEE Transactions on Dependable and Secure Computing
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
IEEE Micro
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Multicore soft error rate stabilization using adaptive dual modular redundancy
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
This paper presents a dynamically scheduled pipeline structure for chip multiprocessors (CMPs). This technique exploits existing Simultaneous Multithreading (SMT), superscalar chip multiprocessors' redundancy to provide low-overhead, and broad coverage of faults at the cost of performance degradation for processors. This pipeline structure operates in two modes: 1) high-performance and 2) highly-reliable. In high-performance mode, each core works as a real SMT, superscalar processor. Whereas, the main contribution of the highly-reliable mode is: 1) To enhance the reliability of the system without adding extra redundancy strictly for fault tolerance, 2) To detect both transient and permanent faults, and 3) To recover existing faults. The experimental results show that the diagnosis mechanism quickly and accurately diagnoses faults. The fault detection latency for this technique is equal to the pipeline length of the processor, while it provides high fault detection coverage. Moreover, the reliable processor can function quite capably in the presence of both transient and permanent faults, despite of not using redundancy beyond which is already available in a modern microprocessor. Also, in the highlyreliable mode, the static and dynamic power consumption is declined by 25% and 36%, respectively.