hiCUDA: High-Level GPGPU Programming

Authors:
Tianyi David Han;Tarek S. Abdelrahman
Affiliations:
University of Toronto, Toronto;University of Toronto, Toronto
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2011

Citing 0
Cited 24

Sponge: portable stream programming on graphics engines

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A code-based analytical approach for using separate device coprocessors in computing systems

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Comparison of design and performance of snow cover computing on GPUs and multi-core processors

WSEAS Transactions on Information Science and Applications
Design and performance evaluation of snow cover computing on GPUs

ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
CUDACL+: a framework for GPU programs

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Paragon: collaborative speculative loop execution on GPU and CPU

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Compiling a high-level language for GPUs: (via language support for architectures and compilers)

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Characterizing and improving the use of demand-fetched caches in GPUs

Proceedings of the 26th ACM international conference on Supercomputing
A CUDA programming toolkit on grids

International Journal of Grid and Utility Computing
Automatic CUDA code synthesis framework for multicore CPU and GPU architectures

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Early evaluation of directive-based GPU programming models for productive exascale computing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing a unified programming model for heterogeneous machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
OpenACC: first experiences with real-world applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
OpenMPC: extended OpenMP for efficient programming and tuning on GPUs

International Journal of Computational Science and Engineering
Scaling large-data computations on multi-GPU accelerators

Proceedings of the 27th international ACM conference on International conference on supercomputing
Portable mapping of openMP to multicore embedded systems using MCA APIs

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Improved background modeling for real-time spatio-temporal non-parametric moving object detection strategies

Image and Vision Computing
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library

Journal of Parallel and Distributed Computing
Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation

Proceedings of Workshop on General Purpose Processing Using GPUs
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)
A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hi{\rm CUDA}, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hi{\rm CUDA} directives as well as the design and implementation of a prototype compiler that translates a hi{\rm CUDA} program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hi{\rm CUDA} provides comes at no expense to performance.