Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

Authors:
Ping Xiang;Yi Yang;Mike Mantor;Norm Rubin;Lisa R. Hsu;Huiyang Zhou
Affiliations:
North Carolina State University, Raleigh, NC, USA;NEC Laboratories America, Princeton, NJ, USA;AMD Inc., Orlando, FL, USA;AMD Inc., Orlando, FL, USA;AMD Inc., Orlando, FL, USA;North Carolina State University, Raleigh, NC, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 11
Cited 0

Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Understanding software approaches for GPGPU reliability

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Dynamic detection of uniform and affine vectors in GPGPU computations

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Divergence Analysis and Optimizations

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Power-efficient computing for compute-intensive GPGPU applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data (SIMD) style execution to achieve both high computational throughput and energy efficiency. As previous works have shown, there exists significant computational redundancy in SIMD execution, where different execution lanes operate on the same operand values. Such value locality is referred to as uniform vectors. In this paper, we first show that besides redundancy within a uniform vector, different vectors can also have the identical values. Then, we propose detailed architecture designs to exploit both types of redundancy. For redundancy within a uniform vector, we propose to either extend the vector register file with token bits or add a separate small scalar register file to eliminate redundant computations as well as redundant data storage. For redundancy across different uniform vectors, we adopt instruction reuse, proposed originally for CPU architectures, to detect and eliminate redundancy. The elimination of redundant computations and data storage leads to both significant energy savings and performance improvement. Furthermore, we propose to leverage such redundancy to protect arithmetic-logic units (ALUs) and register files against hardware errors. Our detailed evaluation shows that our proposed design has low hardware overhead and achieves performance gains, up to 23.9% and 12.0% on average, along with energy savings, up to 24.8% and 12.6% on average, as well as a 21.1% and 14.1% protection coverage for ALUs and register files, respectively.