Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

Authors:
G. R. Mudalige;M. B. Giles;J. Thiyagalingam;I. Z. Reguly;C. Bertolli;P. H. J. Kelly;A. E. Trefethen
Affiliations:
-;-;-;-;-;-;-
Venue:
Parallel Computing
Year:
2013

Citing 22
Cited 1

Multicolor ICCG methods for vector computers

SIAM Journal on Numerical Analysis
Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines

Advances in Engineering Software
Pracniques: further remarks on reducing truncation errors

Communications of the ACM
Generative Programming and Active Libraries

Selected Papers from the International Seminar on Generic Programming
A framework approach for developing parallel adaptive multiphysics applications

Finite Elements in Analysis and Design - Special issue: The fifteenth annual Robert J. Melosh competition
libMesh: a C++ library for parallel adaptive mesh refinement/coarsening simulations

Engineering with Computers
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Ypnos: declarative, parallel structured grid programming

Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Performance analysis of the OP2 framework on many-core architectures

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Beyond Traditional Microprocessors for Geoscience High-Performance Computing Applications

IEEE Micro
A novel shared-memory thread-pool implementation for hybrid parallel CFD solvers

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Liszt: a domain specific language for building portable mesh-based PDE solvers

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Unstructured mesh partition improvement for implicit finite element at extreme scale

The Journal of Supercomputing
On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures

The Computer Journal
Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

The Computer Journal
Design and performance of the OP2 library for unstructured mesh applications

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Predictive modeling and analysis of OP2 on distributed memory GPU clusters

ACM SIGMETRICS Performance Evaluation Review
PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
An Analytical Study of Loop Tiling for a Large-Scale Unstructured Mesh Application

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Loop Chaining: A Programming Abstraction for Balancing Locality and Parallelism

IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Designing OP2 for GPU architectures

Journal of Parallel and Distributed Computing

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2's recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.