Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Authors:
Shaojing Li;Boris Livshitz;Vitaliy Lomakin
Affiliations:
Department of Electrical and Computer Engineering, University of California, San Diego, United States;Department of Electrical and Computer Engineering, University of California, San Diego, United States;Department of Electrical and Computer Engineering, University of California, San Diego, United States
Venue:
Journal of Computational Physics
Year:
2010

Citing 13
Cited 3

A fast algorithm for particle simulations

Journal of Computational Physics
Rapid solution of integral equations of scattering theory in two dimensions

Journal of Computational Physics
Fast evaluation of three-dimensional transient wave fields using diagonal translation operators

Journal of Computational Physics
A sparse matrix arithmetic based on H-matrices. Part I: introduction to H-matrices

Computing
A sparse H -matrix arithmetic. Part II: application to multi-dimensional problems

Computing
A fast adaptive multipole algorithm in three dimensions

Journal of Computational Physics
Accelerating Fast Multipole Methods for the Helmholtz Equation at Low Frequencies

IEEE Computational Science & Engineering
A wideband fast multipole method for the Helmholtz equation in three dimensions

Journal of Computational Physics
Fast evaluation of time domain fields in sub-wavelength source/observer distributions using accelerated Cartesian expansions (ACE)

Journal of Computational Physics
Fast multipole methods on graphics processors

Journal of Computational Physics
Parallel accelerated cartesian expansions for particle dynamics simulations

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A multilevel Cartesian non-uniform grid time domain algorithm

Journal of Computational Physics
A precorrected-FFT method for electrostatic analysis of complicated 3-D structures

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

A multilevel Cartesian non-uniform grid time domain algorithm

Journal of Computational Physics
Acoustic scattering solver based on single level FMM for multi-GPU systems

Journal of Parallel and Distributed Computing
A Cartesian non-uniform grid interpolation method for fast field evaluation on elongated domains

International Journal of Numerical Modelling: Electronic Networks, Devices and Fields

Quantified Score

Hi-index	31.45

Visualization

Abstract

This paper presents a parallel algorithm implemented on graphics processing units (GPUs) for rapidly evaluating spatial convolutions between the Helmholtz potential and a large-scale source distribution. The algorithm implements a non-uniform grid interpolation method (NGIM), which uses amplitude and phase compensation and spatial interpolation from a sparse grid to compute the field outside a source domain. NGIM reduces the computational time cost of the direct field evaluation at N observers due to N co-located sources from O(N^2) to O(N) in the static and low-frequency regimes, to O(NlogN) in the high-frequency regime, and between these costs in the mixed-frequency regime. Memory requirements scale as O(N) in all frequency regimes. Several important differences between CPU and GPU implementations of the NGIM are required to result in optimal performance on respective platforms. In particular, in the CPU implementations all operations, where possible, are pre-computed and stored in memory in a preprocessing stage. This reduces the computational time but significantly increases the memory consumption. In the GPU implementations, where handling memory often is a critical bottle neck, several special memory handling techniques are used to accelerate the computations. A significant latency of the GPU global memory access is hidden by implementing coalesced reading, which requires arranging many array elements in contiguous parts of memory. Contrary to the CPU version, most of the steps in the GPU implementations are executed on-fly and only necessary arrays are kept in memory. This results in significantly reduced memory consumption, increased problem size N that can be handled, and reduced computational time on GPUs. The obtained GPU-CPU speed-up ratios are from 150 to 400 depending on the required accuracy and problem size. The presented method and its CPU and GPU implementations can find important applications in various fields of physics and engineering.