Direct numerical simulation of turbulence using GPU accelerated supercomputers

Authors:
Ali Khajeh-Saeed;J. Blair Perot
Affiliations:
Theoretical and Computational Fluid Dynamics Laboratory, University of Massachusetts, Amherst, MA 01003, USA;Theoretical and Computational Fluid Dynamics Laboratory, University of Massachusetts, Amherst, MA 01003, USA
Venue:
Journal of Computational Physics
Year:
2013

Citing 12
Cited 1

An analysis of the fractional step method

Journal of Computational Physics
Conservation properties of unstructured staggered mesh schemes

Journal of Computational Physics
Analysis of an exact fractional step method

Journal of Computational Physics
Higher-order mimetic methods for unstructured meshes

Journal of Computational Physics
Discrete calculus methods for diffusion

Journal of Computational Physics
Large calculation of the flow over a hypersonic vehicle using a GPU

Journal of Computational Physics
3D finite difference computation on GPUs using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
GPU accelerated simulations of bluff body flows using vortex particle methods

Journal of Computational Physics
Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors

Journal of Computational Physics
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community

Computing in Science and Engineering
Solving the euler equations on graphics processing units

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Computational Fluid Dynamics Simulations Using Many Graphics Processors

Computing in Science and Engineering

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

The Journal of Supercomputing

Quantified Score

Hi-index	31.45

Visualization

Abstract

Direct numerical simulations of turbulence are optimized for up to 192 graphics processors. The results from two large GPU clusters are compared to the performance of corresponding CPU clusters. A number of important algorithm changes are necessary to access the full computational power of graphics processors and these adaptations are discussed. It is shown that the handling of subdomain communication becomes even more critical when using GPU based supercomputers. The potential for overlap of MPI communication with GPU computation is analyzed and then optimized. Detailed timings reveal that the internal calculations are now so efficient that the operations related to MPI communication are the primary scaling bottleneck at all but the very largest problem sizes that can fit on the hardware. This work gives a glimpse of the CFD performance issues will dominate many hardware platform in the near future.