An analysis of the fractional step method
Journal of Computational Physics
Conservation properties of unstructured staggered mesh schemes
Journal of Computational Physics
Analysis of an exact fractional step method
Journal of Computational Physics
Higher-order mimetic methods for unstructured meshes
Journal of Computational Physics
Discrete calculus methods for diffusion
Journal of Computational Physics
Large calculation of the flow over a hypersonic vehicle using a GPU
Journal of Computational Physics
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
GPU accelerated simulations of bluff body flows using vortex particle methods
Journal of Computational Physics
Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors
Journal of Computational Physics
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community
Computing in Science and Engineering
Solving the euler equations on graphics processing units
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Computational Fluid Dynamics Simulations Using Many Graphics Processors
Computing in Science and Engineering
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
The Journal of Supercomputing
Hi-index | 31.45 |
Direct numerical simulations of turbulence are optimized for up to 192 graphics processors. The results from two large GPU clusters are compared to the performance of corresponding CPU clusters. A number of important algorithm changes are necessary to access the full computational power of graphics processors and these adaptations are discussed. It is shown that the handling of subdomain communication becomes even more critical when using GPU based supercomputers. The potential for overlap of MPI communication with GPU computation is analyzed and then optimized. Detailed timings reveal that the internal calculations are now so efficient that the operations related to MPI communication are the primary scaling bottleneck at all but the very largest problem sizes that can fit on the hardware. This work gives a glimpse of the CFD performance issues will dominate many hardware platform in the near future.