Self-checking instructions: reducing instruction redundancy for concurrent error detection

  • Authors:
  • Sumeet Kumar;Aneesh Aggarwal

  • Affiliations:
  • Binghamton University, Binghamton, NY;Binghamton University, Binghamton, NY

  • Venue:
  • Proceedings of the 15th international conference on Parallel architectures and compilation techniques
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

With reducing feature size, increasing chip capacity, and increasing clock speed, microprocessors are becoming increasingly susceptible to transient (soft) errors. Redundant multi-threading (RMT) is an attractive approach for concurrent error detection. However, redundant thread execution has a significant impact on performance and energy consumption in the chip.In this paper, we propose reducing instruction redundancy (the instructions that are redundantly executed) as a means to mitigate the performance and energy impact of redundancy. In this paper, we experiment with an decoupled RMT approach where the frontend pipeline stages are protected through error codes, while the backend pipeline stages are protected through redundant execution. In this approach, we define two categories of instructions—self-checking and semi self-checking instructions. Self checking instructions are those instructions whose results are checked for any errors when their "main" copies are executed. These instructions are not redundantly executed. Semi self-checking instructions are those instructions for which a major part of their results is checked when the "main" copies are executed, and the remaining part of the instructions is checked using a small amount of additional hardware. Reducing instruction redundancy with this approach has the same fault coverage as the base architecture where all the instructions are redundantly executed. The techniques are evaluated in terms of their performance, power, and vulnerability impact on the RMT processor. Our experiments show that the techniques reduce instruction redundancy by about 58% and recover about 51% of the performance lost due to redundant execution. Our techniques also recover about 40% of the energy consumption increase in the key data-path structures.