Performance evaluation of OpenMP and CUDA on multicore systems

Authors:
Chao-Tung Yang;Tzu-Chieh Chang;Kuan-Lung Huang;Jung-Chun Liu;Chih-Hung Chang
Affiliations:
Department of Computer Science, Tunghai University, Taichung City, Taiwan;Department of Computer Science, Tunghai University, Taichung City, Taiwan;Department of Computer Science, Tunghai University, Taichung City, Taiwan;Department of Computer Science, Tunghai University, Taichung City, Taiwan;Department of Information Management, Hsiuping University of Science Technology, Taichung City, Taiwan
Venue:
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Year:
2012

Citing 7
Cited 0

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
Heterogeneous multicore parallel programming for graphics processing units

Scientific Programming - Software Development for Multi-core Computing Systems
Hybrid Parallel Programming on GPU Clusters

ISPA '10 Proceedings of the International Symposium on Parallel and Distributed Processing with Applications
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA

The Journal of Supercomputing
Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, not only CPU but also GPU goes along the trend of multi-core processors. Parallel processing presents not only an opportunity but also a challenge at the same time. To explicitly parallelize the software by programmers or compilers is the key for enhancing the performance on multi-core chip. In this paper, we first introduce some of the automatic parallel tools based OpenMP, which could save the time to rewrite codes for parallel processing on multicore system. Then we focus on ROSE and explore it in depth. And we also implement an interface to reduce its complexity of use and use some automatic parallelization for CUDA.