Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units

Authors:
Christian Obrecht;Frédéric Kuznik;Bernard Tourancheau;Jean-Jacques Roux
Affiliations:
Centre de Thermique de Lyon, UMR, CNRS, INSA-Lyon, Université de Lyon, Villeurbanne Cedex, France;Centre de Thermique de Lyon, UMR, CNRS, INSA-Lyon, Université de Lyon, Villeurbanne Cedex, France;Laboratoire de l'Informatique du Parallélisme, UMR, CNRS, ENS de Lyon, INRIA, UCB Lyon 1, École Normale Supérieure de Lyon, Lyon Cedex 07, France;Centre de Thermique de Lyon, UMR, CNRS, INSA-Lyon, Université de Lyon, Villeurbanne Cedex, France
Venue:
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Year:
2010

Citing 8
Cited 2

A 3D incompressible thermal lattice Boltzmann model and its application to simulate natural convection in a cubic cavity

Journal of Computational Physics
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
TeraFLOP computing on a desktop PC with GPUs for 3D CFD

International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
Exploring New Architectures in Accelerating CFD for Air Force Applications

HPCMP-UGC '08 Proceedings of the 2008 DoD HPCMP Users Group Conference
Power Consumption of GPUs from a Software Perspective

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
LBM based flow simulation using GPU computing processor

Computers & Mathematics with Applications
A new approach to the lattice Boltzmann method for graphics processing units

Computers & Mathematics with Applications

The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method

International Journal of High Performance Computing Applications
Efficient GPU implementation of the linearly interpolated bounce-back boundary condition

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal is to facilitate optimisation of regular data-parallel applications on GPUs. As an example, we finally describe our CUDA implementations of LBM flow solvers on which our model was able to estimate performance with less than 5% relative error.