Dynamic detection of uniform and affine vectors in GPGPU computations

Authors:
Sylvain Collange;David Defour;Yao Zhang
Affiliations:
ELIAUS, Université de Perpignan, Perpignan, France;ELIAUS, Université de Perpignan, Perpignan, France;ECE Department, University of California Davis
Venue:
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Year:
2009

Citing 4
Cited 6

The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor

ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Power Consumption of GPUs from a Software Perspective

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU Accelerated RNA Folding Algorithm

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I

Simultaneous branch and warp interweaving for sustained GPU performance

Proceedings of the 39th Annual International Symposium on Computer Architecture
Inter-warp instruction temporal locality in deep-multithreaded GPUs

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

Proceedings of the 27th international ACM conference on International conference on supercomputing
Microarchitectural mechanisms to exploit value structure in SIMT architectures

Proceedings of the 40th Annual International Symposium on Computer Architecture
Divergence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a hardware mechanism which dynamically detects uniform and affine vectors used in SPMD architecture such as Graphics Processing Units, to minimize pressure on the register file and reduce power consumption with minimal architectural modifications. A preliminary experimental analysis conducted with the Barra simulator shows that this optimization can benefit up to 34% of register file reads and 22% of the computations in common GPGPU applications.