Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 27th international ACM conference on International conference on supercomputing
Computing infrastructure for big data processing
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
The peak performance of graphics processing units (GPUs) has traditionally been increased by increasing the number of compute resources and/or their frequency. However, these approaches significantly increase the power consumption of GPUs. Consequently, modern high-performance GPUs are power constrained and must employ more power efficient approaches for performance improvements in future processors. In this paper we propose three power-efficient techniques for improving the performance of GPUs. First, we observe that many GPGPU applications are integer instruction intensive. For such applications, we propose to utilize the fused multiply-add (FMA) units to fuse dependent integer instructions into a composite instruction, improving power efficiency and performance by reducing the number of fetched/executed instructions. Secondly, GPUs often perform computations that are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar pipeline. Finally, the register file bandwidth in GPUs is a critical resource that is optimized for 32-bit instruction operands. However, many operands require considerably fewer bits for accurate representation and computations. We propose a sliced GPU architecture that improves performance of the GPU by dual-issuing instructions to two 16-bit execution slices. Overall, our techniques result in more than a 25% (geometric mean) power efficiency improvement.