Parallel programming in OpenMP
Parallel programming in OpenMP
A multigrid solver for boundary value problems using programmable graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics
ACM SIGGRAPH 2008 classes
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
TeraFLOP computing on a desktop PC with GPUs for 3D CFD
International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
Large calculation of the flow over a hypersonic vehicle using a GPU
Journal of Computational Physics
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Low viscosity flow simulations for animation
Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Molecular dynamics simulations on commodity GPUs with CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Optimizing Monte Carlo radiosity on graphics hardware
The Journal of Supercomputing
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA
The Journal of Supercomputing
An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems
Journal of Parallel and Distributed Computing
Multi-level parallelism for incompressible flow computations on GPU clusters
Parallel Computing
A GPU implementation of a structural-similarity-based aerial-image classification
The Journal of Supercomputing
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
The Journal of Supercomputing
Hi-index | 0.00 |
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel "co-processors" to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier---Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA's Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.