Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A code-based analytical approach for using separate device coprocessors in computing systems
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Comparison of design and performance of snow cover computing on GPUs and multi-core processors
WSEAS Transactions on Information Science and Applications
Design and performance evaluation of snow cover computing on GPUs
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
CUDACL+: a framework for GPU programs
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Paragon: collaborative speculative loop execution on GPU and CPU
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Characterizing and improving the use of demand-fetched caches in GPUs
Proceedings of the 26th ACM international conference on Supercomputing
A CUDA programming toolkit on grids
International Journal of Grid and Utility Computing
Automatic CUDA code synthesis framework for multicore CPU and GPU architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing a unified programming model for heterogeneous machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
OpenACC: first experiences with real-world applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
OpenMPC: extended OpenMP for efficient programming and tuning on GPUs
International Journal of Computational Science and Engineering
Scaling large-data computations on multi-GPU accelerators
Proceedings of the 27th international ACM conference on International conference on supercomputing
Portable mapping of openMP to multicore embedded systems using MCA APIs
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Image and Vision Computing
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing
Efficient Mapping of Irregular C++ Applications to Integrated GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation
Proceedings of Workshop on General Purpose Processing Using GPUs
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
The Journal of Supercomputing
Hi-index | 0.00 |
Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hi{\rm CUDA}, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hi{\rm CUDA} directives as well as the design and implementation of a prototype compiler that translates a hi{\rm CUDA} program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hi{\rm CUDA} provides comes at no expense to performance.