Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Authors:
Julien C. Thibault;Inanc Senocak
Affiliations:
Department of Computer Science, Boise State University, Boise, USA 83725;Department of Mechanical and Biomedical Engineering, Boise State University, Boise, USA 83725
Venue:
The Journal of Supercomputing
Year:
2012

Citing 16
Cited 4

Parallel programming in OpenMP

Parallel programming in OpenMP
A multigrid solver for boundary value problems using programmable graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
General purpose molecular dynamics simulations fully implemented on graphics processing units

Journal of Computational Physics
Stream computing

ACM SIGGRAPH 2008 classes
GPU accelerated pathfinding

Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
TeraFLOP computing on a desktop PC with GPUs for 3D CFD

International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
Large calculation of the flow over a hypersonic vehicle using a GPU

Journal of Computational Physics
3D finite difference computation on GPUs using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Low viscosity flow simulations for animation

Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Molecular dynamics simulations on commodity GPUs with CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
Optimizing Monte Carlo radiosity on graphics hardware

The Journal of Supercomputing
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA

The Journal of Supercomputing

An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

Journal of Parallel and Distributed Computing
Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing
A GPU implementation of a structural-similarity-based aerial-image classification

The Journal of Supercomputing
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel "co-processors" to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier---Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA's Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.