Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
A compiler framework for optimization of affine loop nests for gpgpus
Proceedings of the 22nd annual international conference on Supercomputing
High performance discrete Fourier transforms on graphics processors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
A cross-input adaptive framework for GPU program optimizations
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A GPGPU compiler for memory optimization and parallelism management
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
A large-scale cross-architecture evaluation of thread-coarsening
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The Implementation of a High Performance GPGPU Compiler
International Journal of Parallel Programming
Hi-index | 0.00 |
This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naïve GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler generates two kernels, one optimized for global memories and the other for texture memories. The proposed compilation process is effective for both AMD/ATI and NVIDIA GPUs. The experiments show that our optimized code achieves very high performance, either superior or very close to highly fine-tuned libraries.