A New Parallel Kernel-Independent Fast Multipole Method

Authors:
Lexing Ying;George Biros;Denis Zorin;Harper Langston
Affiliations:
Courant Institute, New York University, New York;Courant Institute, New York University, New York;Courant Institute, New York University, New York;Courant Institute, New York University, New York
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 14
Cited 14

An implementation of the fast multipole method without multipoles

SIAM Journal on Scientific and Statistical Computing
Astrophysical N-body simulations using hierarchical tree data structures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A parallel hashed Oct-Tree N-body algorithm

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel adaptive fast multipole method

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Preconditioned, adaptive, multipole-accelerated iterative methods for three-dimensional first-kind integral equations of potential theory

SIAM Journal on Scientific Computing
Provably Good Partitioning and Load Balancing Algorithms for Parallel Adaptive N-Body Simulation

SIAM Journal on Scientific Computing
A unifying data structure for hierarchical methods

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Scalable electromagnetic scattering calculations on the SGI Origin 2000

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A fast adaptive multipole algorithm in three dimensions

Journal of Computational Physics
A data-parallel implementation of O(N) hierarchical N-body methods

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Scalable atomistic simulation algorithms for materials research

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A new version of the fast multipole method for screened Coulomb interactions in three dimensions

Journal of Computational Physics
NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A scalable parallel fast multipole method for analysis of scattering from perfect electrically conducting surfaces

Proceedings of the 2002 ACM/IEEE conference on Supercomputing

Scalable Parallel Octree Meshing for TeraScale Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Fast multipole method for the biharmonic equation in three dimensions

Journal of Computational Physics
A high-order 3D boundary integral equation solver for elliptic PDEs in smooth domains

Journal of Computational Physics
Krylov deferred correction accelerated method of lines transpose for parabolic problems

Journal of Computational Physics
Low-constant parallel algorithms for finite element simulations using linear octrees

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A massively parallel adaptive fast-multipole method on heterogeneous architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
On the limits of GPU acceleration

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An Efficient Higher-Order Fast Multipole Boundary Element Solution for Poisson-Boltzmann-Based Molecular Electrostatics

SIAM Journal on Scientific Computing
A massively parallel adaptive fast multipole method on heterogeneous architectures

Communications of the ACM
A parallel and incremental extraction of variational capacitance with stochastic geometric moments

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

Proceedings of Workshop on General Purpose Processing Using GPUs
A genetic programming based learning system to derive multipole and local expansions for the fast multipole method

AI Communications

Quantified Score

Hi-index	0.03

Visualization

Abstract

We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernel-independent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPI-based parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixed-size scalability for a variety of kernels on the Pittsburgh Supercomputing Center's TCS-1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance.