System-level reliability modeling for MPSoCs

  • Authors:
  • Yun Xiang;Thidapat Chantem;Robert P. Dick;X. Sharon Hu;Li Shang

  • Affiliations:
  • University of Michigan, Ann Arbor, MI, USA;University of Notre Dame, Notre Dame, IN, USA;University of Michigan, Ann Arbor, MI, USA;University of Notre Dame, Notre Dame, IN, USA;University of Colorado, Boulder, CO, USA

  • Venue:
  • CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The reliability of multi-processor systems-on-chip (MPSoCs) is affected by several inter-dependent system-level and physical effects. Accurate and fast reliability modeling is a primary challenge in the design and optimization of reliable MPSoCs. This paper presents a reliability modeling framework that integrates device-, component-, and system-level models. This framework contains modules for electromigration, time-dependent dielectric breakdown, stress migration, and variable-amplitude thermal cycling. A new statistical reliability distribution is proposed for accurate characterization of components containing too few devices for an extreme value distribution to be appropriate. A hierarchical system-level survival lattice based Monte Carlo technique is used to estimate the temporal fault distributions of MPSoCs that use arbitrary static and dynamic reliability-enhancing redundancy schemes. Physical process variation, which may have a significant impact on MPSoC reliability, is considered in the model. The proposed modeling technique has only 5% average error in mean time to failure and reduces simulation time by nearly 3 orders of magnitude relative to a non-hierarchical Monte Carlo technique