Power-Efficient Error Tolerance in Chip Multiprocessors

  • Authors:
  • M. Wasiur Rashid;Edwin J. Tan;Michael C. Huang;David H. Albonesi

  • Affiliations:
  • University of Rochester;University of Rochester;University of Rochester;Cornell University

  • Venue:
  • IEEE Micro
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

As device dimensions continue to be scaled, microprocessors are becoming increasingly vulnerable to environmental disturbances such as a cosmic particle strike, which can cause transient errors. Thus, redundancy becomes more imperative to prevent operational failure due to these errors. Exploiting the natural structural redundancy of multi-core architectures to execute multiple copies of the same program is an effective approach and incurs very little design complexity. Unfortunately, existing Redundant Multi-Threading (RMT) approaches incur high power overhead, a significant disadvantage in an era when power is arguably the most important limiting factor in microprocessors.In this paper, an RMT microarchitecture that significantly reduces this power overhead without impacting performance is presented. The approach exploits the fact that when the verification is parallelized and run on multiple cores, each can run much slower and therefore in a much more energy-efficient configuration, for example through voltage scaling. The design uses a novel approach to buffer a large amount of unverified stores and yet allow fast searching to enforce dependences. This in turn allows the computation thread to run far ahead of the verification ones to create enough of a workload for efficient parallelization.