On the GPU performance of cell-centered finite volume method over unstructured tetrahedral meshes

Authors:
Johannes Langguth;Nan Wu;Jun Chai;Xing Cai
Affiliations:
Simula Research Laboratory, Lysaker, Norway;National University of Defense Technology, Changsha, Hunan, P.R. China and Simula Research Laboratory, Lysaker, Norway;National University of Defense Technology, Changsha, Hunan, P.R. China and Simula Research Laboratory, Lysaker, Norway;Simula Research Laboratory, Lysaker, Norway and University of Oslo, Oslo, Norway
Venue:
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Year:
2013

Citing 7
Cited 0

A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
An overview of the Trilinos project

ACM Transactions on Mathematical Software (TOMS) - Special issue on the Advanced CompuTational Software (ACTS) Collection
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A new approach for sparse matrix vector product on NVIDIA GPUs

Concurrency and Computation: Practice & Experience
Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs

ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finite volume methods are widely used numerical strategies for solving partial differential equations. This paper aims at obtaining a quantitative understanding of the achievable GPU performance of finite volume computations in the context of the cell-centered finite volume method on 3D unstructured tetrahedral meshes. By using an optimized implementation and a synthetic connectivity matrix that exhibits a perfect structure of equal-sized blocks lying on the main diagonal, we can closely relate the achievable computing performance to the size of these diagonal blocks. Moreover, we have derived a theoretical model for identifying characteristic levels of the attainable performance as function of the GPU's key hardware parameters. A realistic upper limit of the performance can thus be accurately predicted. For real-world tetrahedral meshes, the key to high performance lies in a reordering of the tetrahedra, such that the resulting connectivity matrix resembles a block diagonal form where the optimal size of the blocks depends on the GPU hardware. Performance can then be predicted accurately based on the success of the reordering. Numerical experiments confirm that the achieved performance is close to the practically attainable maximum and it reaches 75% of the theoretical upper limit, independent of the actual tetrahedral mesh considered.