IVF: characterizing the vulnerability of microprocessor structures to intermittent faults

  • Authors:
  • Songjun Pan;Yu Hu;Xiaowei Li

  • Affiliations:
  • Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, and The Graduate University of Chinese Academy of Sciences, Beijing, China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As CMOS technology scales into the nanometer era, future shipped microprocessors will be increasingly vulnerable to intermittent faults. Quantitatively characterizing the vulnerability of microprocessor structures to intermittent faults at an early design stage is significantly helpful in balancing system reliability and performance. Prior researches have proposed several metrics to analyze the vulnerability of microprocessor structures to soft errors and hard faults, however, the vulnerability of these structures to intermittent faults is rarely considered yet. In this work, we propose a metric intermittent vulnerability factor (IVF) to characterize the vulnerability of microprocessor structures to intermittent faults. A structure's IVF is the probability an intermittent fault in that structure causes an external visible error (failure). We compute IVFs for reorder buffer and register file considering three intermittent fault models: intermittent stuck-at-1 and stuck-at-0 fault model, intermittent open and short fault model, and intermittent timing fault model. Experimental results show that, among the three types of intermittent faults, intermittent stuck-at-1 faults have the most serious impact on program execution. Besides, IVF varies significantly across individual structures and programs, which implies partial protection to the most vulnerable structures and program phases for minimizing performance and/or energy overheads.