Scaling fast multipole methods up to 4000 GPUs

  • Authors:
  • Rio Yokota;Lorena Barba;Tetsu Narumi;Kenji Yasuoka

  • Affiliations:
  • King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;Boston University, Boston, MA;University of Electro-Communications, Chofu, Tokyo, Japan;Keio University, Hiyoshi, Yokohama, Japan

  • Venue:
  • Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Fast Multipole Method (FMM) is a hierarchical N-body algorithm with linear complexity, high arithmetic intensity, high data locality, has hierarchical communication patterns, and no global synchronization. The combination of these features allows the FMM to scale well on large GPU based systems, and to use their compute capability effectively. We present a 1 PFlop/s calculation of isotropic turbulence with 64 billion vortex particles using 4096 GPUs on the TSUBAME 2.0 system.