Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Real-time robot motion planning using rasterizing computer graphics hardware
SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Fast computation of generalized Voronoi diagrams using graphics hardware
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Real-Time Rendering
Physically-based visual simulation on graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Fast matrix multiplies using graphics hardware
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Imagine: Media Processing with Streams
IEEE Micro
Real-Time Shader Programming, Using DirectX 9.0
Real-Time Shader Programming, Using DirectX 9.0
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
IEEE Micro
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance study of LU decomposition on the programmable GPU
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Multi-grain parallel processing of data-clustering on programmable graphics hardware
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
High-performance cone beam reconstruction using CUDA compatible GPUs
Parallel Computing
Accelerating cone beam reconstruction using the CUDA-enabled GPU
HiPC'08 Proceedings of the 15th international conference on High performance computing
A GPGPU approach for accelerating 2-d/3-d rigid registration of medical images
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Recently, graphics processing units (GPUs) are providing increasingly higher performance with programmable internal processors, namely vertex processors (VPs) and fragment processors (FPs). Such newly added capabilities motivate us to perform general-purpose computation on GPUs (GPGPU) beyond graphics applications. Although VPs and FPs are connected in a pipeline, many GPGPU implementations utilize only FPs as a computational engine in the GPU. Therefore, such implementations may result in lower performance due to highly loaded FPs (as compared to VPs) being a performance bottleneck in the pipeline execution. The objective of our work is to improve the performance of GPGPU programs by eliminating this bottleneck. To achieve this, we present a code motion technique that is capable of reducing the FP workload by moving assembly instructions appropriately from the FP program to the VP program. We also present the definition of such movable instructions that do not change the I/O specification between the CPU and the GPU. The experimental results show that (1) our technique improves the performance of a Gaussian filter program with reducing execution time by approximately 40% and (2) it successfully reduces the FP workload in 10 out of 18 GPGPU programs.