Low-cost protection for SER upsets and silicon defects

Authors:
Mojtaba Mehrara;Mona Attariyan;Smitha Shyam;Kypros Constantinides;Valeria Bertacco;Todd Austin
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
Proceedings of the conference on Design, automation and test in Europe
Year:
2007

Citing 16
Cited 2

Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
A Hyper Optimal Encoding Scheme for Self-Checking Circuits

IEEE Transactions on Computers
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reliable computer systems (3rd ed.): design and evaluation

Reliable computer systems (3rd ed.): design and evaluation
A Fault Tolerant Approach to Microprocessor Design

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies

VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Making Typical Silicon Matter with Razor

Computer
Tolerating Hard Faults in Microprocessor Array Structures

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Robust System Design with Built-In Soft-Error Resilience

Computer
FASER: Fast Analysis of Soft Error Susceptibility for Cell-Based Designs

ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
MARS-C: modeling and reduction of soft errors in combinational circuits

Proceedings of the 43rd annual Design Automation Conference
Ultra low-cost defect protection for microprocessor pipelines

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reliability limits for the gate insulator in CMOS technology

IBM Journal of Research and Development
Immunotronics - novel finite-state-machine architectures withbuilt-in self-test using self-nonself differentiation

IEEE Transactions on Evolutionary Computation

A unified online fault detection scheme via checking of stability violation

Proceedings of the Conference on Design, Automation and Test in Europe
Application-aware diagnosis of runtime hardware faults

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extreme transistor scaling trends in silicon technology are soon to reach a point where manufactured systems will suffer from limited device reliability and severely reduced life-time, due to early transistor failures, gate oxide wear-out, manufacturing defects, and radiation-induced soft errors (SER). In this paper we present a low-cost technique to harden a microprocessor pipeline and caches against these reliability threats. Our approach utilizes online built-in self-test (BIST) and microarchitectural checkpointing to detect, diagnose and recover the computation impaired by silicon defects or SER events. The approach works by periodically testing the processor to determine if the system is broken. If so, we reconfigure the processor to avoid using the broken component. A similar mechanism is used to detect SER, faults, with the difference that recovery is implemented by re-execution. By utilizing low-cost techniques to address defects and SER, we keep protection costs significantly lower than traditional fault-tolerance approaches while providing high levels of coverage for a wide range of faults. Using detailed gate-level simulation, we find that our approach provides 95% and 99% coverage for silicon defects and SER events, respectively, with only a 14% area overhead.