Cost-effective soft-error protection for SRAM-based structures in GPGPUs

  • Authors:
  • Jingweijia Tan;Zhi Li;Xin Fu

  • Affiliations:
  • University of Kansas, Lawrence, KS;University of Kansas, Lawrence, KS;University of Kansas, Lawrence, KS

  • Venue:
  • Proceedings of the ACM International Conference on Computing Frontiers
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The general-purpose computing on graphics processing units (GPGPUs) are increasingly used to accelerate parallel applications. This makes reliability a growing concern in GPUs as they are originally designed for graphics processing with relaxed requirements for execution correctness. With CMOS processing technologies continuously scaling down to the nano-scale, on-chip soft error rate (SER) has been predicted to increase exponentially. GPGPUs with hundreds of cores integrated into a single chip are prone to manifest high SER. This paper aims to enhance the GPGPU reliability in light of soft errors. We leverage the GPGPU microarchitecture characteristics, and propose energy-efficient protection mechanisms for two typical SRAM-based structures (i.e. instruction buffer and registers) which suffer high susceptibility. We develop Similarity-AWare Protection (SAWP) scheme that leverages the instruction similarity to provide the near-full ECC protection to the instruction buffer with quite little area and power overhead. Based on the observation that shared memory usually exhibits low utilization, we propose SHAred memory to Register Protection (SHARP) scheme, it intelligently leverages shared memory to hold the ECCs of registers. Experimental results show that our techniques have the strong capability of substantially improving the structure vulnerability, and significantly reducing the power consumption compared to the full ECC protection mechanism.