An efficient non-enumerative method to estimate path delay fault coverage
ICCAD '92 1992 IEEE/ACM international conference proceedings on Computer-aided design
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Too much delay fault coverage is a bad thing
Proceedings of the IEEE International Test Conference 2001
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Speed Binning with Path Delay Test in 150-nm Technology
IEEE Design & Test
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ELF-Murphy Data on Defects and Test Sets
VTS '04 Proceedings of the 22nd IEEE VLSI Test Symposium
Defect and Error Tolerance in the Presence of Massive Numbers of Defects
IEEE Design & Test
Statistics of progressive breakdown in ultra-thin Oxides
Microelectronic Engineering - Special issue: Proceedings of the 13th biennial conference on insulating films on semiconductors
Reliability Wearout Mechanisms in Advanced CMOS Technologies
Reliability Wearout Mechanisms in Advanced CMOS Technologies
Power efficiency for variation-tolerant multicore processors
Proceedings of the 2006 international symposium on Low power electronics and design
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon
IEEE Design & Test
Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
BlackJack: Hard Error Detection with Redundant Threads on SMT
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Circuit Failure Prediction and Its Application to Transistor Aging
VTS '07 Proceedings of the 25th IEEE VLSI Test Symmposium
Application-Level Correctness and its Impact on Fault Tolerance
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Self-calibrating Online Wearout Detection
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Adaptive techniques for overcoming performance degradation due to aging in digital circuits
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Facelift: Hiding and slowing down aging in multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Architectural core salvaging in a multi-core processor for hard-error tolerance
Proceedings of the 36th annual international symposium on Computer architecture
Variation-aware supply voltage assignment for minimizing circuit degradation and leakage
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
NBTI-aware power gating for concurrent leakage and aging optimization
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Circuit techniques for dynamic variation tolerance
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 46th Annual Design Automation Conference
The BubbleWrap many-core: popping cores for sequential acceleration
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
EnerJ: approximate data types for safe and general low-power computation
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Sampling + DMR: practical and low-overhead permanent fault detection
Proceedings of the 38th annual international symposium on Computer architecture
Setting an error detection infrastructure with low cost acoustic wave detectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Radic: A standard-cell-based sensor for on-chip aging and flip-flop metastability measurements
ITC '12 Proceedings of the 2012 IEEE International Test Conference (ITC)
Hi-index | 0.00 |
Hardware failure due to wearout is a growing concern. Circuit failure prediction is an approach that is effective if it meets the following requirements: low design complexity, low overheads, generality (supporting various types of wearout including soft and hard breakdown) and high accuracy. State-of-the-art techniques, which typically detect and measure low level circuit properties like gate delay cannot deliver on all four requirements. Moving away from the paradigm of measuring circuit delays is key to satisfying the four design requirements. Our insight is to virtually age the processor and thus manifest a wearout fault early -- we convert the delay degradation into a logic fault; expose the fault and then detect the fault. To virtually age the processor, reducing supply voltage effectively mirrors wearout. For fault exposure, we observe that faults in critical paths are naturally exposed and we develop a technique to expose faults along the non-critical paths using clock phase shifting logic. Our system, Aged-SDMR, combines these two mechanisms to expose wearout faults early and detects them using Sampling DMR. We also develop principles to combine these two mechanisms with any detection technique. We implement a prototype system based on the OpenRISC processor on a Xilinx Zync FPGA. We demonstrate that Aged-SDMR is practical and delivers on all four requirements, has area and energy overheads of 9% and 0.7% respectively, takes at most 0.4 days to detect failure after onset and its early warning window is configurable. More generally, Aged-SDMR provides the capability for low-overhead DMR execution without any missed errors and 100% coverage. It is likely to find broad uses within reliability and elsewhere.