IBM experiments in soft fails in computer electronics (1978–1994)
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reliability aware power management for dual-processor real-time embedded systems
Proceedings of the 47th Design Automation Conference
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
As device dimensions continue to be scaled, microprocessors are becoming increasingly vulnerable to environmental disturbances such as a cosmic particle strike, which can cause transient errors. Thus, redundancy becomes more imperative to prevent operational failure due to these errors. Exploiting the natural structural redundancy of multi-core architectures to execute multiple copies of the same program is an effective approach and incurs very little design complexity. Unfortunately, existing Redundant Multi-Threading (RMT) approaches incur high power overhead, a significant disadvantage in an era when power is arguably the most important limiting factor in microprocessors.In this paper, an RMT microarchitecture that significantly reduces this power overhead without impacting performance is presented. The approach exploits the fact that when the verification is parallelized and run on multiple cores, each can run much slower and therefore in a much more energy-efficient configuration, for example through voltage scaling. The design uses a novel approach to buffer a large amount of unverified stores and yet allow fast searching to enforce dependences. This in turn allows the computation thread to run far ahead of the verification ones to create enough of a workload for efficient parallelization.