Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
Parallel Computing
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Heterogeneous multicore parallel programming for graphics processing units
Scientific Programming - Software Development for Multi-core Computing Systems
Hybrid Parallel Programming on GPU Clusters
ISPA '10 Proceedings of the International Symposium on Parallel and Distributed Processing with Applications
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA
The Journal of Supercomputing
Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Hi-index | 0.00 |
Nowadays, not only CPU but also GPU goes along the trend of multi-core processors. Parallel processing presents not only an opportunity but also a challenge at the same time. To explicitly parallelize the software by programmers or compilers is the key for enhancing the performance on multi-core chip. In this paper, we first introduce some of the automatic parallel tools based OpenMP, which could save the time to rewrite codes for parallel processing on multicore system. Then we focus on ROSE and explore it in depth. And we also implement an interface to reduce its complexity of use and use some automatic parallelization for CUDA.