Improving error tolerance for multithreaded register files

  • Authors:
  • Lei Wang;Niral Patel

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Connecticut, Storrs, CT;Department of Electrical and Computer Engineering, University of Connecticut, Storrs, CT

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chip multithreaded computing is exposed to the dual challenges of increasing system complexity and error sensitivity. It is critical to develop effective solutions that achieve better error tolerance without inducing performance degradation. In this paper, we propose a new error-tolerant memory design based on a unique computing phenomenon referred to as the dynamic multithreading redundancy (DMR). The proposed technique exploits the interplay between the concurrent threads for runtime error control. We also present two DMR enhancements, immediate write-back and self-recovery, to address the error accumulation effect. A multithreaded register file was implemented to demonstrate the proposed DMR technique. Simulation results on the SPEC CPU2000 benchmarks demonstrate significant overhead reduction in performance and energy efficiency related to error recovery. In addition, the proposed technique features good scalability with respect to the instruction-level and thread-level parallelism for next-generation processor design, where the soft error problem is expected to get worse due to technology scaling and architecture-affecting trends.