Fast online error detection and correction with thread signature calculae

  • Authors:
  • Bernhard Fechner

  • Affiliations:
  • University of Augsburg, 86159 Augsburg, Germany

  • Venue:
  • Microprocessors & Microsystems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

To recognize transient control-flow and data faults, caused by Single-Event Upsets (SEUs) in a microprocessor pipeline, several mechanisms to check the execution in the retirement have been proposed and discussed over the years. In this paper, we suggest a compression-based and compression-free checksum-scheme, which is able to recognize transient faults before commitment and preserves binary compatibility. The scheme is applicable for time-redundant (virtual duplex and redundantly multithreaded systems) as well as structural redundant systems. It can localize a fault by partial re-execution within the pipeline. By additionally introducing a modified micro-rollback, single or multiple pipeline stages can be rolled back for a retry. In the best case, a fault can be localized, detected and corrected in four clock cycles within a fine-grained redundantly threaded microprocessor. We validate and analyze the scheme through an FPGA and standard-cell implementation and conclude that it is able to replace the well-known parity-computation for high-performance designs.