Towards understanding the effects of intermittent hardware faults on programs

  • Authors:
  • Layali Rashid;Karthik Pattabiraman;Sathish Gopalakrishnan

  • Affiliations:
  • The University of British Columbia, Canada;The University of British Columbia, Canada;The University of British Columbia, Canada

  • Venue:
  • DSNW '10 Proceedings of the 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Intermittent hardware faults are bursts of errors that last from a few CPU cycles to a few seconds. They are caused by process variations, circuit wear-out, and temperature, clock or voltage fluctuations. Recent studies show that intermittent fault rates are increasing due to technology scaling and are likely to be a significant concern in future systems. We study the propagation of intermittent faults to programs; in particular, we are interested in the crash behaviour of programs. We use a model of a program that represents the data dependencies in a fault-free trace of the program and we analyze this model to glean some information about the length of intermittent faults and their effect on the program under specific fault and crash models. The results of our study can aid fault detection, diagnosis and recovery techniques.