Fingerprinting: hash-based error detection in microprocessors

  • Authors:
  • Jared C. Smolens

  • Affiliations:
  • Carnegie Mellon University

  • Venue:
  • Fingerprinting: hash-based error detection in microprocessors
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today's commodity processors are tuned primarily for performance and power. As CMOS scaling continues into the deep sub-micron regime, soft errors and device wearout will increasingly jeopardize the reliability of unprotected processor pipelines. To preserve reliable operation, processor cores will require mechanisms to detect errors affecting the control and datapaths. Conventional techniques such as parity, error correcting codes, and self-checking circuits have high implementation overheads and therefore these techniques cannot be easily applied to complex and timing-critical high-performance pipelines. This thesis proposes and evaluates architectural and microarchitectural fingerprints. A fingerprint is a compact (e.g., 16-bit) signature of recent architectural or microarchitectural state updates. By periodically comparing a fingerprint and a reference over an interval of execution, the system can detect errors in a timely and bandwidth-efficient manner. Architectural fingerprints capture in-order architectural state with lightweight monitoring hardware at the retirement stages of a pipeline, while microarchitectural fingerprints leverage existing design-for-test hardware to accumulate a signature of internal state. This thesis explores two applications of fingerprints. In the Reunion execution model, this thesis shows that architectural fingerprints can detect both soft errors and input incoherence with complexity-effective redundant execution in a chip multiprocessor. Cycle-accurate simulation shows that the performance overhead is only 5-6% over more complicated designs that strictly replicate inputs. In another application, FIRST, fingerprints detect emerging wearout faults by periodically testing the processor under marginal operating conditions. Wearout fault simulation in a commercial processor show that architectural fingerprints have high coverage of widespread wearout, while microarchitectural fingerprints provide superior coverage of both individual and widespread wearout.