The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Power Consumption of GPUs from a Software Perspective
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU Accelerated RNA Folding Algorithm
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Simultaneous branch and warp interweaving for sustained GPU performance
Proceedings of the 39th Annual International Symposium on Computer Architecture
Inter-warp instruction temporal locality in deep-multithreaded GPUs
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Proceedings of the 27th international ACM conference on International conference on supercomputing
Microarchitectural mechanisms to exploit value structure in SIMT architectures
Proceedings of the 40th Annual International Symposium on Computer Architecture
ACM Transactions on Programming Languages and Systems (TOPLAS)
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
We present a hardware mechanism which dynamically detects uniform and affine vectors used in SPMD architecture such as Graphics Processing Units, to minimize pressure on the register file and reduce power consumption with minimal architectural modifications. A preliminary experimental analysis conducted with the Barra simulator shows that this optimization can benefit up to 34% of register file reads and 22% of the computations in common GPGPU applications.