A Case for Fault Tolerance and Performance Enhancement Using Chip Multi-Processors

  • Authors:
  • Huiyang Zhou

  • Affiliations:
  • -

  • Venue:
  • IEEE Computer Architecture Letters
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper makes a case for using multi-core processors to simultaneously achieve transient-fault tolerance and performance enhancement. Our approach is extended from a recent latency-tolerance proposal, dual-core execution (DCE). In DCE, a program is executed twice in two processors, named the front and back processors. The front processor pre-processes instructions in a very fast yet highly accurate way and the back processor re-executes the instruction stream retired from the front processor. The front processor runs faster as it has nocorrectness constraints whereas its results, including timely prefetching and prompt branch misprediction resolution, help the back processor make faster progress. In this paper, we propose to entrust the speculative results of the front processor and use them to check the un-speculative results of the back processor. A discrepancy, either due to a transient fault or a mispeculation, is then handled with the existing mispeculation recovery mechanism. In this way, both transient-fault tolerance and performance improvement can be delivered simultaneously with little hardware overhead.