Performance characterization of the NAS Parallel Benchmarks in OpenCL

  • Authors:
  • Sangmin Seo;Gangwon Jo;Jaejin Lee

  • Affiliations:
  • Center for Manycore Programming, School of Computer Science and Engineering, Seoul National University, 151-744, Korea;Center for Manycore Programming, School of Computer Science and Engineering, Seoul National University, 151-744, Korea;Center for Manycore Programming, School of Computer Science and Engineering, Seoul National University, 151-744, Korea

  • Venue:
  • IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Heterogeneous parallel computing platforms, which are composed of different processors (e.g., CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing domains. With this trend, parallel programming models need to achieve portability across different processors as well as high performance with reasonable programming effort. OpenCL (Open Computing Language) is an open standard and emerging parallel programming model to write parallel applications for such heterogeneous platforms. In this paper, we characterize the performance of an OpenCL implementation of the NAS Parallel Benchmark suite (NPB) on a heterogeneous parallel platform that consists of general-purpose CPUs and a GPU. We believe that understanding the performance characteristics of conventional workloads, such as the NPB, with an emerging programming model (i.e., OpenCL) is important for developers and researchers to adopt the programming model. We also compare the performance of the NPB in OpenCL to that of the OpenMP version. We describe the process of implementing the NPB in OpenCL and optimizations applied in our implementation. Experimental results and analysis show that the OpenCL version has different characteristics from the OpenMP version on multicore CPUs and exhibits different performance characteristics depending on different OpenCL compute devices. The results also indicate that the application needs to be rewritten or re-optimized for better performance on a different compute device although OpenCL provides source-code portability.