Systematic Unidirectional Error-Detecting Codes
IEEE Transactions on Computers
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Digital integrated circuits: a design perspective
Digital integrated circuits: a design perspective
ICS '90 Proceedings of the 4th international conference on Supercomputing
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
Few electron devices: towards hybrid CMOS-SET integrated circuits
Proceedings of the 39th annual Design Automation Conference
Fault-Secure Parity Prediction Arithmetic Operators
IEEE Design & Test
The Performance of Cache-Based Error Recovery in Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Fault Tolerant Approach to Microprocessor Design
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
WHICH CONCURRENT ERROR DETECTION SCHEME TO CHOOSE?
ITC '00 Proceedings of the 2000 IEEE International Test Conference
Exploiting Microarchitectural Redundancy For Defect Tolerance
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Tutorial Part 1: Nanometer-Scale CMOS Devices
ISQED '04 Proceedings of the 5th International Symposium on Quality Electronic Design
Manufacturing-Aware Physical Design
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
VLSI Design Challenges for Gigascale Integration
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Evaluation of Test Metrics: Stuck-at, Bridge Coverage Estimate and Gate Exhaustive
VTS '06 Proceedings of the 24th IEEE VLSI Test Symposium
Reliability limits for the gate insulator in CMOS technology
IBM Journal of Research and Development
Reliability-aware data placement for partial memory protection in embedded processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Low-cost protection for SER upsets and silicon defects
Proceedings of the conference on Design, automation and test in Europe
Adapting to intermittent faults in multicore systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Invited paper: Variability in nanometer CMOS: Impact, analysis, and minimization
Integration, the VLSI Journal
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
End-to-end register data-flow continuous self-test
Proceedings of the 36th annual international symposium on Computer architecture
Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack
Proceedings of the 46th Annual Design Automation Conference
mSWAT: low-cost hardware fault detection and diagnosis for multicore systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Tribeca: design for PVT variations with local recovery and fine-grained adaptation
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Using introspective software-based testing for post-silicon debug and repair
Proceedings of the 47th Design Automation Conference
Eliminating voltage emergencies via software-guided code transformations
ACM Transactions on Architecture and Code Optimization (TACO)
NBTI-aware DVFS: a new approach to saving energy and increasing processor lifetime
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
Improving yield and reliability of chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Application-aware diagnosis of runtime hardware faults
Proceedings of the International Conference on Computer-Aided Design
Specification and synthesis of hardware checkpointing and rollback mechanisms
Proceedings of the 49th Annual Design Automation Conference
Viper: virtual pipelines for enhanced reliability
Proceedings of the 39th Annual International Symposium on Computer Architecture
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A survey of checker architectures
ACM Computing Surveys (CSUR)
Virtually-aged sampling DMR: unifying circuit failure prediction and circuit failure detection
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A low-power instruction replay mechanism for design of resilient microprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
The sustained push toward smaller and smaller technology sizes has reached a point where device reliability has moved to the forefront of concerns for next-generation designs. Silicon failure mechanisms, such as transistor wearout and manufacturing defects, are a growing challenge that threatens the yield and product lifetime of future systems. In this paper we introduce the BulletProof pipeline, the first ultra low-cost mechanism to protect a microprocessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing techniques and system-level checkpointing to provide the same guarantees of reliability found in traditional solutions, but at much lower cost. Our approach utilizes a microarchitectural checkpointing mechanism which creates coarse-grained epochs of execution, during which distributed on-line built in self-test (BIST) mechanisms validate the integrity of the underlying hardware. In case a failure is detected, we rely on the natural redundancy of instructionlevel parallel processors to repair the system so that it can still operate in a degraded performance mode. Using detailed circuit-level and architectural simulation, we find that our approach provides very high coverage of silicon defects (89%) with little area cost (5.8%). In addition, when a defect occurs, the subsequent degraded mode of operation was found to have only moderate performance impacts, (from 4% to 18% slowdown).