CGS, a fast Lanczos-type solver for nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
SIAM Journal on Scientific and Statistical Computing
Conjugate gradient-type methods for linear systems with complex symmetric coefficient matrices
SIAM Journal on Scientific and Statistical Computing - Special issue on iterative methods in numerical linear algebra
A transpose-free quasi-minimal residual algorithm for non-Hermitian linear systems
SIAM Journal on Scientific Computing
An implementation of the QMR method based on coupled two-term recurrences
SIAM Journal on Scientific Computing
Matrix computations (3rd ed.)
ML(k)BiCGSTAB: A BiCGSTAB Variant Based on Multiple Lanczos Starting Vectors
SIAM Journal on Scientific Computing
Iterative Solution of Dense Linear Systems Arising from Integral Equations
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Interactive simulation of radiation scattering
Proceedings of the 28th Spring Conference on Computer Graphics
Hi-index | 0.00 |
This work presents a highly optimized computational frameworkfor the Discrete Dipole Approximation, a numerical method forcalculating the optical properties associated with a target ofarbitrary geometry that is widely used in atmospheric,astrophysical and industrial simulations. Core optimizationsinclude the bit-fielding of integer data and iterative methods thatcomplement a new Discrete Fourier Transform (DFT) kernel, whichefficiently calculates the matrix-vector products required by theseiterative solution schemes. The new kernel performs the requisite3-D DFTs as ensembles of 1-D transforms, and by doing so, is ableto reduce the number of constituent 1-D transforms by 60% and thememory by over 80%. The optimizations also facilitate the use ofparallel techniques to further enhance the performance. CompleteOpenMP-based shared-memory and MPI-based distributed-memoryimplementations have been created to take full advantage of thevarious architectures. Several benchmarks of the new frameworkindicate extremely favorable performance and scalability.