Unified reliability estimation and management of NoC based chip multiprocessors

  • Authors:
  • Alexandre Yasuo Yamamoto;Cristinel Ababei

  • Affiliations:
  • -;-

  • Venue:
  • Microprocessors & Microsystems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new architecture level unified reliability evaluation methodology for chip multiprocessors (CMPs). The proposed reliability estimation (REST) is based on a Monte Carlo algorithm. What distinguishes REST from the previous work is that both the computational and communication components are considered in a unified manner to compute the reliability of the CMP. We utilize REST tool to develop a new dynamic reliability management (DRM) scheme to address time-dependent dielectric breakdown and negative-bias temperature instability aging mechanisms in network-on-chip (NoC) based CMPs. Designed as a control loop, the proposed DRM scheme uses an effective neural network based reliability estimation module. The neural-network predictor is trained using the REST tool. We investigate how system's lifetime changes when the NoC as the communication unit of the CMP is considered or not during the reliability evaluation process and find that differences can be as high as 60%. Full-system based simulations using a customized GEM5 simulator show that reliability can be improved by up to 52% using the proposed DRM scheme in a best-effort scenario with 2-9% performance penalty (using a user set target lifetime of 7years) over the case when no DRM is employed.