Self-Stabilizing Microprocessor: Analyzing and Overcoming Soft Errors

  • Authors:
  • Shlomi Dolev;Yinnon A. Haviv

  • Affiliations:
  • IEEE;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2006

Quantified Score

Hi-index 14.98

Visualization

Abstract

Soft errors are changes in memory value caused by external radiation or electrical noise. Decreases in computing feature sizes and power usages and shorting the microcycle period enhance the influence of soft errors. Self-stabilizing systems are designed to be started in an arbitrary, possibly a corrupted, state due to, say, soft errors, and to converge to a desired behavior. Self-stabilization is defined by the state space of the components and is essentially a well-founded, clearly defined form of the terms self-healing, automatic-recovery, automatic-repair, and autonomic-computing. To implement a self-stabilizing system, one needs to ensure that the microprocessor that executes the program is self-stabilizing. A self-stabilizing microprocessor copes with any combination of soft errors, converging to perform fetch-decode-execute in fault-free periods. Still, it is important that the microprocessor will avoid convergence periods if possible by masking the effect of soft errors immediately. In this work, we present design schemes for a self-stabilizing microprocessor and a new technique for analyzing the effect of soft errors. Previous schemes for analyzing the effect of soft errors were based on simulations. In contrast, our scheme computes a lower bound on microprocessor reliability and enables the microprocessor designer to evaluate the reliability of the design and to identify reliability bottlenecks. When analyzing the resiliency of digital circuits to soft errors, we examine the logical masking, i.e., errors in internal nodes of the circuits that are masked later by the computation. We show that the problem of computing the reliability of a circuit such that logical masking is taken into account is an NP--hard problem.