Designing OP2 for GPU architectures

Authors:
M. B. Giles;G. R. Mudalige;B. Spencer;C. Bertolli;I. Reguly
Affiliations:
Oxford e-Research Centre, University of Oxford, UK;Oxford e-Research Centre, University of Oxford, UK;Department of Computer Science, University of Oxford, UK;Department of Computing, Imperial College London, UK;Pázmány Péter Catholic University, Hungary
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 3
Cited 2

Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Performance analysis of the OP2 framework on many-core architectures

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Liszt: a domain specific language for building portable mesh-based PDE solvers

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores
Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

OP2 is an ''active'' library framework for the solution of unstructured mesh applications. It aims to decouple the specification of a scientific application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the current OP2 library for generating efficient code targeting contemporary GPU platforms. In this we focus on some of the software architecture design choices and low-level optimizations to maximize performance on NVIDIA's Fermi architecture GPUs. The performance impact of these design choices is quantified on two NVIDIA GPUs (GTX560Ti, Tesla C2070) using the end-to-end performance of an industrial representative CFD application developed using the OP2 API. Results show that for each system, a number of key configuration parameters need to be set carefully in order to gain good performance. Utilizing a recently developed auto-tuning framework, we explore the effect of these parameters, their limitations and insights into optimizations for improved performance.