EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Using compiler directives for accelerating CFD applications on GPUs
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
OpenACC: first experiences with real-world applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
OpenMPC: extended OpenMP for efficient programming and tuning on GPUs
International Journal of Computational Science and Engineering
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
CPU+GPU scheduling with asymptotic profiling
Parallel Computing
Hi-index | 0.00 |
OpenMP [14] is the dominant programming model for shared-memory parallelism in C, C++ and Fortran due to its easy-touse directive-based style, portability and broad support by compiler vendors. Compute-intensive application regions are increasingly being accelerated using devices such as GPUs and DSPs, and a programming model with similar characteristics is needed here. This paper presents extensions to OpenMP that provide such a programming model. Our results demonstrate that a high-level programming model can provide accelerated performance comparable to that of hand-coded implementations in CUDA.