OpenCL as a unified programming model for heterogeneous CPU/GPU clusters
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
Proceedings of the 26th ACM international conference on Supercomputing
Using compiler directives for accelerating CFD applications on GPUs
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
SnuCL and an MPI+OpenCL implementation of HPL on heterogeneous CPU/GPU clusters
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Scaling analytics applications with OpenCL for loosely coupled heterogeneous clusters
Proceedings of the ACM International Conference on Computing Frontiers
Automatic OpenCL work-group size selection for multicore CPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing
OpenCL framework for ARM processors with NEON support
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
Heterogeneous parallel computing platforms, which are composed of different processors (e.g., CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing domains. With this trend, parallel programming models need to achieve portability across different processors as well as high performance with reasonable programming effort. OpenCL (Open Computing Language) is an open standard and emerging parallel programming model to write parallel applications for such heterogeneous platforms. In this paper, we characterize the performance of an OpenCL implementation of the NAS Parallel Benchmark suite (NPB) on a heterogeneous parallel platform that consists of general-purpose CPUs and a GPU. We believe that understanding the performance characteristics of conventional workloads, such as the NPB, with an emerging programming model (i.e., OpenCL) is important for developers and researchers to adopt the programming model. We also compare the performance of the NPB in OpenCL to that of the OpenMP version. We describe the process of implementing the NPB in OpenCL and optimizations applied in our implementation. Experimental results and analysis show that the OpenCL version has different characteristics from the OpenMP version on multicore CPUs and exhibits different performance characteristics depending on different OpenCL compute devices. The results also indicate that the application needs to be rewritten or re-optimized for better performance on a different compute device although OpenCL provides source-code portability.