Introduction to algorithms
SIAM Journal on Scientific and Statistical Computing
Fast adaptive methods for the free-space heat equation
SIAM Journal on Scientific Computing
A Kronecker Product Representation of the Fast Gauss Transform
SIAM Journal on Matrix Analysis and Applications
Improved Fast Gauss Transform and Efficient Kernel Density Estimation
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
IEEE Transactions on Pattern Analysis and Machine Intelligence
Application of the Fast Gauss Transform to Option Pricing
Management Science
Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel
SIAM Journal on Scientific Computing
Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Multidimensional Fast Gauss Transforms by Chebyshev Expansions
SIAM Journal on Scientific Computing
High Order Accurate Methods for the Evaluation of Layer Heat Potentials
SIAM Journal on Scientific Computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The Fast Generalized Gauss Transform
SIAM Journal on Scientific Computing
IEEE Transactions on Image Processing
Computers in Biology and Medicine
Hi-index | 0.00 |
We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take $O(N^2)$ time. The parallel time complexity estimates for our algorithms are $O(N/np)$ for uniform point distributions and $O(N/np log N/np + nplognp)$ for nonuniform distributions using np CPUs. We incorporate a planewave representation of the Gaussian kernel which permits “diagonal translation”. We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle nonuniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer at the Oak Ridge National Laboratory. Our implementation is kernel-independent and can handle other “Gaussian-type” kernels even when an explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.