Self-Stabilizing Microprocessor: Analyzing and Overcoming Soft Errors

Authors:
Shlomi Dolev;Yinnon A. Haviv
Affiliations:
IEEE;-
Venue:
IEEE Transactions on Computers
Year:
2006

Citing 12
Cited 7

Structured computer organization; (2nd ed.)

Structured computer organization; (2nd ed.)
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Computing with unreliable information

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Soft-error Monte Carlo modeling program, SEMM

IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Self-stabilization

Self-stabilization
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Self-checking and fault-tolerant digital design

Self-checking and fault-tolerant digital design
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Dijkstra's Self-Stabilizing Algorithm in Unsupportive Environments

WSS '01 Proceedings of the 5th International Workshop on Self-Stabilizing Systems
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
On networks of noisy gates

SFCS '85 Proceedings of the 26th Annual Symposium on Foundations of Computer Science

A low-SER efficient core processor architecture for future technologies

Proceedings of the conference on Design, automation and test in Europe
Majority Logic Mapping for Soft Error Dependability

Journal of Electronic Testing: Theory and Applications
Self-stabilizing device drivers

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Stabilization enabling technology

SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
Self-stabilizing device drivers

SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
Self-stabilizing Java

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
An infrastructure for accurate characterization of single-event transients in digital circuits

Microprocessors & Microsystems

Quantified Score

Hi-index	14.98

Visualization

Abstract

Soft errors are changes in memory value caused by external radiation or electrical noise. Decreases in computing feature sizes and power usages and shorting the microcycle period enhance the influence of soft errors. Self-stabilizing systems are designed to be started in an arbitrary, possibly a corrupted, state due to, say, soft errors, and to converge to a desired behavior. Self-stabilization is defined by the state space of the components and is essentially a well-founded, clearly defined form of the terms self-healing, automatic-recovery, automatic-repair, and autonomic-computing. To implement a self-stabilizing system, one needs to ensure that the microprocessor that executes the program is self-stabilizing. A self-stabilizing microprocessor copes with any combination of soft errors, converging to perform fetch-decode-execute in fault-free periods. Still, it is important that the microprocessor will avoid convergence periods if possible by masking the effect of soft errors immediately. In this work, we present design schemes for a self-stabilizing microprocessor and a new technique for analyzing the effect of soft errors. Previous schemes for analyzing the effect of soft errors were based on simulations. In contrast, our scheme computes a lower bound on microprocessor reliability and enables the microprocessor designer to evaluate the reliability of the design and to identify reliability bottlenecks. When analyzing the resiliency of digital circuits to soft errors, we examine the logical masking, i.e., errors in internal nodes of the circuits that are masked later by the computation. We show that the problem of computing the reliability of a circuit such that logical masking is taken into account is an NP--hard problem.