A fast algorithm for particle simulations
Journal of Computational Physics
A parallel adaptive fast multipole method
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
Journal of Parallel and Distributed Computing
Multipole translation theory for the three-dimensional Laplace and Helmholtz equations
SIAM Journal on Scientific Computing
Provably Good Partitioning and Load Balancing Algorithms for Parallel Adaptive N-Body Simulation
SIAM Journal on Scientific Computing
A fast adaptive multipole algorithm in three dimensions
Journal of Computational Physics
Guest Editors' Introduction: The Top 10 Algorithms
Computing in Science and Engineering
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Fast multipole method for the biharmonic equation in three dimensions
Journal of Computational Physics
Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU
Journal of Parallel and Distributed Computing
Rapid Multipole Graph Drawing on the GPU
Graph Drawing
Nodal discontinuous Galerkin methods on graphics processors
Journal of Computational Physics
A massively parallel adaptive fast-multipole method on heterogeneous architectures
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
GPU accelerated simulations of bluff body flows using vortex particle methods
Journal of Computational Physics
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Fast evaluation of Helmholtz potential on graphics processing units (GPUs)
Journal of Computational Physics
On the limits of GPU acceleration
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Journal of Real-Time Image Processing
190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
On well-separated sets and fast multipole methods
Applied Numerical Mathematics
Proceedings of the 48th Design Automation Conference
Scalable fast multipole methods on distributed heterogeneous architectures
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A massively parallel adaptive fast multipole method on heterogeneous architectures
Communications of the ACM
Acoustic scattering solver based on single level FMM for multi-GPU systems
Journal of Parallel and Distributed Computing
Scaling fast multipole methods up to 4000 GPUs
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Monte Carlo simulation of the Ising model on FPGA
Journal of Computational Physics
Parallel time-space processing model based fast N-body simulation on GPUs
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Adaptive fast multipole methods on the GPU
The Journal of Supercomputing
Computational physics on graphics processing units
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Parallelization of the FMM on distributed-memory GPGPU systems for acoustic-scattering prediction
The Journal of Supercomputing
Efficient FMM accelerated vortex methods in three dimensions via the Lamb-Helmholtz decomposition
Journal of Computational Physics
A GPU parallelized spectral method for elliptic equations in rectangular domains
Journal of Computational Physics
An (almost) direct deployment of the Fast Multipole Method on the Cell processor
The Journal of Supercomputing
Least-squares hermite radial basis functions implicits with adaptive sampling
Proceedings of Graphics Interface 2013
A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 31.51 |
The fast multipole method allows the rapid approximate evaluation of sums of radial basis functions. For a specified accuracy, @e, the method scales as O(N) in both time and memory compared to the direct method with complexity O(N^2), which allows the solution of larger problems with given resources. Graphical processing units (GPU) are now increasingly viewed as data parallel compute coprocessors that can provide significant computational performance at low price. We describe acceleration of the FMM using the data parallel GPU architecture. The FMM has a complex hierarchical (adaptive) structure, which is not easily implemented on data-parallel processors. We described strategies for parallelization of all components of the FMM, develop a model to explain the performance of the algorithm on the GPU architecture; and determined optimal settings for the FMM on the GPU. These optimal settings are different from those on usual CPUs. Some innovations in the FMM algorithm, including the use of modified stencils, real polynomial basis functions for the Laplace kernel, and decompositions of the translation operators, are also described. We obtained accelerations of the Laplace kernel FMM on a single NVIDIA GeForce 8800 GTX GPU in the range of 30-60 compared to a serial CPU FMM implementation. For a problem with a million sources, the summations involved are performed in approximately one second. This performance is equivalent to solving of the same problem at a 43 Teraflop rate if we use straightforward summation.