Yet another fast multipole method without multipoles—pseudoparticle multipole method
Journal of Computational Physics
A fast adaptive multipole algorithm in three dimensions
Journal of Computational Physics
A kernel-independent adaptive fast multipole algorithm in two and three dimensions
Journal of Computational Physics
Fast multipole methods on graphics processors
Journal of Computational Physics
The black-box fast multipole method
Journal of Computational Physics
A massively parallel adaptive fast-multipole method on heterogeneous architectures
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scaling Hierarchical N-body Simulations on GPU Clusters
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Journal of Computational Physics
A fast directional algorithm for high-frequency electromagnetic scattering
Journal of Computational Physics
Scalable fast multipole methods on distributed heterogeneous architectures
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A sparse octree gravitational N-body code that runs entirely on the GPU processor
Journal of Computational Physics
Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems
Computing in Science and Engineering
Hi-index | 0.00 |
The Fast Multipole Method (FMM) is a hierarchical N-body algorithm with linear complexity, high arithmetic intensity, high data locality, has hierarchical communication patterns, and no global synchronization. The combination of these features allows the FMM to scale well on large GPU based systems, and to use their compute capability effectively. We present a 1 PFlop/s calculation of isotropic turbulence with 64 billion vortex particles using 4096 GPUs on the TSUBAME 2.0 system.