Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU

  • Authors:
  • Keun Soo Yim;Cuong Pham;Mushfiq Saleheen;Zbigniew Kalbarczyk;Ravishankar Iyer

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance and relatively low cost of GPU-based platforms provide an attractive alternative for general purpose high performance computing (HPC). However, the emerging HPC applications have usually stricter output cor-rectness requirements than typical GPU applications (i.e., 3D graphics). This paper first analyzes the error resiliency of GPGPU platforms using a fault injection tool we have devel-oped for commodity GPU devices. On average, 16-33% of in-jected faults cause silent data corruption (SDC) errors in the HPC programs executing on GPU. This SDC ratio is signifi-cantly higher than that measured in CPU programs (