Large-scale simulations on multiple Graphics Processing Units (GPUs) for the direct simulation Monte Carlo method

  • Authors:
  • C. -C. Su;M. R. Smith;F. -A. Kuo;J. -S. Wu;C. -W. Hsieh;K. -C. Tseng

  • Affiliations:
  • Department of Mechanical Engineering, National Chiao Tung University, Hsinchu, Taiwan;Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan;Department of Mechanical Engineering, National Chiao Tung University, Hsinchu, Taiwan and National Center for High-Performance Computing, National Applied Research Laboratories, Hsinchu, Taiwan;Department of Mechanical Engineering, National Chiao Tung University, Hsinchu, Taiwan and National Center for High-Performance Computing, National Applied Research Laboratories, Hsinchu, Taiwan;National Center for High-Performance Computing, National Applied Research Laboratories, Hsinchu, Taiwan;National Space Organization, National Applied Research Laboratories, Hsinchu, Taiwan

  • Venue:
  • Journal of Computational Physics
  • Year:
  • 2012

Quantified Score

Hi-index 31.45

Visualization

Abstract

In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method.