Large-scale simulations on multiple Graphics Processing Units (GPUs) for the direct simulation Monte Carlo method

Authors:
C. -C. Su;M. R. Smith;F. -A. Kuo;J. -S. Wu;C. -W. Hsieh;K. -C. Tseng
Affiliations:
Department of Mechanical Engineering, National Chiao Tung University, Hsinchu, Taiwan;Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan;Department of Mechanical Engineering, National Chiao Tung University, Hsinchu, Taiwan and National Center for High-Performance Computing, National Applied Research Laboratories, Hsinchu, Taiwan;Department of Mechanical Engineering, National Chiao Tung University, Hsinchu, Taiwan and National Center for High-Performance Computing, National Applied Research Laboratories, Hsinchu, Taiwan;National Center for High-Performance Computing, National Applied Research Laboratories, Hsinchu, Taiwan;National Space Organization, National Applied Research Laboratories, Hsinchu, Taiwan
Venue:
Journal of Computational Physics
Year:
2012

Citing 1
Cited 1

Scalar and parallel optimized implementation of the direct simulation Monte Carlo method

Journal of Computational Physics

Multi-GPU simulations of Vlasov's equation using Vlasiator

Parallel Computing

Quantified Score

Hi-index	31.45

Visualization

Abstract

In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method.