Early evaluation of directive-based GPU programming models for productive exascale computing

Authors:
Seyong Lee;Jeffrey S. Vetter
Affiliations:
Oak Ridge National Laboratory;Oak Ridge National Laboratory, Georgia Institute of Technology
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 15
Cited 3

Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors

Proceedings of the International Symposium on Code Generation and Optimization
A performance-oriented data parallel virtual machine for GPUs

ACM SIGGRAPH 2006 Sketches
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
hiCUDA: High-Level GPGPU Programming

IEEE Transactions on Parallel and Distributed Systems
The International Exascale Software Project roadmap

International Journal of High Performance Computing Applications
OpenMP for accelerators

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community

Computing in Science and Engineering
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures

IEEE Micro
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

Proceedings of the 9th conference on Computing Frontiers
Experiences with high-level programming directives for porting applications to GPUs

Facing the Multicore-Challenge II

Accelerating financial applications on the GPU

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Semi-automatic restructuring of offloadable tasks for many-core accelerators

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On Expressing Strategies for Directive-Driven Multicore Programing Models

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. These directive-based models provide different levels of abstraction, and required different levels of programming effort to port and optimize applications. Understanding these differences among these new models provides valuable insights on their applicability and performance potential. In this paper, we evaluate existing directive-based models by porting thirteen application kernels from various scientific domains to use CUDA GPUs, which, in turn, allows us to identify important issues in the functionality, scalability, tunability, and debuggability of the existing models. Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.