On the impact of hardware faults --- an investigation of the relationship between workload inputs and failure mode distributions

  • Authors:
  • Domenico Di Leo;Fatemeh Ayatolahi;Behrooz Sangchoolie;Johan Karlsson;Roger Johansson

  • Affiliations:
  • Dipartimento di Informatica e Sistemistica, Università degli Studi di Napoli Federico II, Naples, Italy;Department of Computer Science & Engineering, Chalmers University of Technology, Gothenburg, Sweden;Department of Computer Science & Engineering, Chalmers University of Technology, Gothenburg, Sweden;Department of Computer Science & Engineering, Chalmers University of Technology, Gothenburg, Sweden;Department of Computer Science & Engineering, Chalmers University of Technology, Gothenburg, Sweden

  • Venue:
  • SAFECOMP'12 Proceedings of the 31st international conference on Computer Safety, Reliability, and Security
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Technology scaling of integrated circuits is making transistors increasingly sensitive to process variations, wear-out effects and ionizing particles. This may lead to an increasing rate of transient and intermittent errors in future microprocessors. In order to assess the risk such errors pose to safety critical systems, it is essential to investigate how temporary errors in the instruction set architecture (ISA) registers and main memory locations influence the behaviour of executing programs. To this end, we investigate --- by means of extensive fault injection experiments --- how such errors affect the execution of four target programs. The paper makes three contributions. First, we investigate how the failure modes of the target programs vary for different input sets. Second, we evaluate the error coverage of a software-implemented hardware fault tolerant technique that relies on triple-time redundant execution, majority voting and forward recovery. Third, we propose an approach based on assembly language metrics which can be used to correlate the dynamic fault-free behaviour of a program with its failure mode distribution obtained by fault injection.