Scaling fast multipole methods up to 4000 GPUs

Authors:
Rio Yokota;Lorena Barba;Tetsu Narumi;Kenji Yasuoka
Affiliations:
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;Boston University, Boston, MA;University of Electro-Communications, Chofu, Tokyo, Japan;Keio University, Hiyoshi, Yokohama, Japan
Venue:
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Year:
2012

Citing 15
Cited 0

Yet another fast multipole method without multipoles—pseudoparticle multipole method

Journal of Computational Physics
A fast adaptive multipole algorithm in three dimensions

Journal of Computational Physics
A kernel-independent adaptive fast multipole algorithm in two and three dimensions

Journal of Computational Physics
Fast multipole methods on graphics processors

Journal of Computational Physics
The black-box fast multipole method

Journal of Computational Physics
A massively parallel adaptive fast-multipole method on heterogeneous architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scaling Hierarchical N-body Simulations on GPU Clusters

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

Journal of Computational Physics
A fast directional algorithm for high-frequency electromagnetic scattering

Journal of Computational Physics
Scalable fast multipole methods on distributed heterogeneous architectures

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A sparse octree gravitational N-body code that runs entirely on the GPU processor

Journal of Computational Physics
Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems

Computing in Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Fast Multipole Method (FMM) is a hierarchical N-body algorithm with linear complexity, high arithmetic intensity, high data locality, has hierarchical communication patterns, and no global synchronization. The combination of these features allows the FMM to scale well on large GPU based systems, and to use their compute capability effectively. We present a 1 PFlop/s calculation of isotropic turbulence with 64 billion vortex particles using 4096 GPUs on the TSUBAME 2.0 system.