A new approach to the lattice Boltzmann method for graphics processing units

Authors:
Christian Obrecht;Frédéric Kuznik;Bernard Tourancheau;Jean-Jacques Roux
Affiliations:
Centre de Thermique de Lyon, UMR5008, CNRS, INSA-Lyon, Université de Lyon, France and Laboratoire de l'Informatique du Parallélisme, UMR 5668, CNRS, ENS de Lyon, INRIA, UCB Lyon 1, Franc ...;Centre de Thermique de Lyon, UMR5008, CNRS, INSA-Lyon, Université de Lyon, France;Laboratoire de l'Informatique du Parallélisme, UMR 5668, CNRS, ENS de Lyon, INRIA, UCB Lyon 1, France;Centre de Thermique de Lyon, UMR5008, CNRS, INSA-Lyon, Université de Lyon, France
Venue:
Computers & Mathematics with Applications
Year:
2011

Citing 6
Cited 10

Accurate three-dimensional lid-driven cavity flow

Journal of Computational Physics
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
TeraFLOP computing on a desktop PC with GPUs for 3D CFD

International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
Exploring New Architectures in Accelerating CFD for Air Force Applications

HPCMP-UGC '08 Proceedings of the 2008 DoD HPCMP Users Group Conference
LBM based flow simulation using GPU computing processor

Computers & Mathematics with Applications

Global memory access modelling for efficient implementation of the lattice Boltzmann method on graphics processing units

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Editorial: Mesoscopic methods in engineering and science

Computers & Mathematics with Applications
The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method

International Journal of High Performance Computing Applications
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Parallel Computing
Multi-GPU implementation of the lattice Boltzmann method

Computers & Mathematics with Applications
Efficient GPU implementation of the linearly interpolated bounce-back boundary condition

Computers & Mathematics with Applications
Three-dimensional LBE simulations of a decay of liquid dielectrics with a solute gas into the system of gas-vapor channels under the action of strong electric fields

Computers & Mathematics with Applications
GPU accelerated lattice Boltzmann simulation for rotational turbulence

Computers & Mathematics with Applications
Optimized implementation of the Lattice Boltzmann Method on a graphics processing unit towards real-time fluid simulation

Computers & Mathematics with Applications
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

The Journal of Supercomputing

Quantified Score

Hi-index	0.09

Visualization

Abstract

Emerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since the global memory for graphic devices shows high latency and LBM is data intensive, the memory access pattern is an important issue for achieving good performances. Whenever possible, global memory loads and stores should be coalescent and aligned, but the propagation phase in LBM can lead to frequent misaligned memory accesses. Most previous CUDA implementations of 3D LBM addressed this problem by using low latency on chip shared memory. Instead of this, our CUDA implementation of LBM follows carefully chosen data transfer schemes in global memory. For the 3D lid-driven cavity test case, we obtained up to 86% of the global memory maximal throughput on nVidia's GT200. We show that as a consequence highly efficient implementations of LBM on GPUs are possible, even for complex models.