Instruction-Level Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller

  • Authors:
  • Michail Maniatakos;Naghmeh Karimi;Chandra Tirumurti;Abhijit Jas;Yiorgos Makris

  • Affiliations:
  • Yale University, New Haven;University of Tehran, Tehran;Intel Corporation, Santa Clara;Intel Corporation, Austin;Yale University, New Haven

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2011

Quantified Score

Hi-index 14.98

Visualization

Abstract

We investigate the correlation between low-level faults in the control logic of a modern microprocessor and their instruction-level impact on the execution of typical workload. Such information can prove immensely useful in accurately assessing and prioritizing faults with regards to their criticality, as well as commensurately allocating resources to enhance online testability and error/fault resilience through concurrent error detection/correction methods. To this end, we developed an extensive fault simulation infrastructure which allows injection of stuck-at faults and transient errors of arbitrary starting time and duration, as well as cost-effective simulation and classification of their repercussions into various instruction-level error types. As a test vehicle for our study, we employ a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. Extensive fault injection campaigns in control modules of this microprocessor facilitate valuable observations regarding the distribution of low-level faults into the instruction-level error types that they cause. Experimentation with both Register Transfer (RT-) and Gate-Level faults, as well as with both stuck-at faults and transient errors, confirms the validity and corroborates the utility of these observations.