Soft-error tolerance and mitigation in asynchronous burst-mode circuits

  • Authors:
  • Sobeeh Almukhaizim;Feng Shi;Eric Love;Yiorgos Makris

  • Affiliations:
  • Department of Computer Engineering, Kuwait University, Safat, Kuwait;Central Analog Department, Marvell Semiconductor, Santa Clara, CA;Department of Electrical Engineering, Yale University, NewHaven, CT;Department of Electrical Engineering, Yale University, NewHaven, CT

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss the problem of soft errors in asynchronous burst-mode machines (ABMMs), and we propose two solutions. The first solution is an error tolerance approach, which leverages the inherent functionality of Muller C-elements, along with a variant of duplication, to suppress all transient errors. The proposed method is more robust and less expensive than the typical triple modular redundancy error tolerance method and often even less expensive than previously proposed concurrent error detection methods, which only provide detection but no correction. The second solution is an error mitigation approach, which leverages a newly devised soft-error susceptibility assessment method for ABMMs, along with partial duplication, to suppress a carefully chosen subset of transient errors. Three progressively more powerful options for partial duplication select among individual gates, complete state/output logic cones, or partial state/output logic cones and enable efficient exploration of the tradeoff between the achieved soft-error susceptibility reduction and the incurred area overhead. Furthermore, a gate-decomposition method is developed to leverage the additional soft-error susceptibility reduction opportunities arising during conversion of a two-level ABMM implementation into a multilevel one. Extensive experimental results on benchmark ABMMs assess the effectiveness of the proposed methods in reducing soft-error susceptibility, and their impact on area, performance, and offline testability.