Transient fault prediction based on anomalies in processor events

  • Authors:
  • Satish Narayanasamy;Ayse K. Coskun;Brad Calder

  • Affiliations:
  • University of California, San Diego;University of California, San Diego;University of California, San Diego and Microsoft

  • Venue:
  • Proceedings of the conference on Design, automation and test in Europe
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Future microprocessors will be highly susceptible to transient errors as the sizes of transistors decrease due to CMOS scaling. Prior techniques advocated full scale structural or temporal redundancy to achieve fault tolerance. Though they can provide complete fault coverage, they incur significant hardware and/or performance cost. It is desirable to have mechanisms that can provide partial but sufficiently high fault coverage with negligible cost. To meet this goal, we propose leveraging speculative structures that already exist in modern processors. The proposed mechanism is based on the insight that when a fault occurs, it is likely that the incorrect execution would result in abnormally higher or lower number of mispredictions (branch mispredictions, L2 misses, store set mispredictions) than a correct execution. We design a simple transient fault predictor that detects the anomalous behavior in the outcomes of the speculative structures to predict transient faults.