Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reliability, availability, and serviceability (RAS) of the IBM eServer z990
IBM Journal of Research and Development
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
NonStop® Advanced Architecture
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Reliable Systems on Unreliable Fabrics
IEEE Design & Test
DDMR: Dynamic and Scalable Dual Modular Redundancy with Short Validation Intervals
IEEE Computer Architecture Letters
Architectural core salvaging in a multi-core processor for hard-error tolerance
Proceedings of the 36th annual international symposium on Computer architecture
Energy-efficient redundant execution for chip multiprocessors
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Cost-effective safety and fault localization using distributed temporal redundancy
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
Continued CMOS scaling is expected to make future microprocessors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market. To mitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures execute a single application using two threads, typically as one leading thread and one trailing thread. Errors are detected by comparing the outputs produced by these two threads. These architectures schedule a single application on two cores or two thread contexts of a CMP. As a result, besides the additional energy consumption and performance overhead that is required to provide fault tolerance, such schemes also impose a throughput loss. Consequently a CMP which is capable of executing 2n threads in non-redundant mode can only execute half as many (n) threads in fault-tolerant mode. In this paper we propose multiplexed redundant execution (MRE), a low-overhead architectural technique that executes multiple trailing threads on a single processor core. MRE exploits the observation that it is possible to accelerate the execution of the trailing thread by providing execution assistance from the leading thread. Execution assistance combined with coarse-grained multithreading allows MRE to schedule multiple trailing threads concurrently on a single core with only a small performance penalty. Our results show that MRE increases the throughput of fault-tolerant CMP by 16% over an ideal dual modular redundant (DMR) architecture.