Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
A performance-oriented data parallel virtual machine for GPUs
ACM SIGGRAPH 2006 Sketches
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
hiCUDA: High-Level GPGPU Programming
IEEE Transactions on Parallel and Distributed Systems
The International Exascale Software Project roadmap
International Journal of High Performance Computing Applications
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community
Computing in Science and Engineering
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
Proceedings of the 9th conference on Computing Frontiers
Experiences with high-level programming directives for porting applications to GPUs
Facing the Multicore-Challenge II
Accelerating financial applications on the GPU
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Semi-automatic restructuring of offloadable tasks for many-core accelerators
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On Expressing Strategies for Directive-Driven Multicore Programing Models
Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Hi-index | 0.00 |
Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. These directive-based models provide different levels of abstraction, and required different levels of programming effort to port and optimize applications. Understanding these differences among these new models provides valuable insights on their applicability and performance potential. In this paper, we evaluate existing directive-based models by porting thirteen application kernels from various scientific domains to use CUDA GPUs, which, in turn, allows us to identify important issues in the functionality, scalability, tunability, and debuggability of the existing models. Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.