A fast algorithm for particle simulations
Journal of Computational Physics
The parallel multipole method on the connection machine
SIAM Journal on Scientific and Statistical Computing
Mapping the adaptive fast multipole algorithm onto MIMD systems
Unstructured scientific computation on scalable multiprocessors
Astrophysical N-body simulations using hierarchical tree data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
Scalable parallel formulations of the barnes-hut method for n-body simulations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Parallel Version of the Fast Multipole Method-Invited Talk
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Parallel hierarchical solvers and preconditioners for boundary element methods
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Architecture, algorithms and applications for future generation supercomputers
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
A scalable eigensolver for large scale-free graphs using 2D graph partitioning
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Matrix-vector products (mat-vecs) form the core of iterative methods used for solving dense linear systems. Often, these systems arise in the solution of integral equations used in electromagnetics, heat transfer, and wave propagation. In this paper, we present a parallel approximate method for computing mat-vecs used in the solution of integral equations. We use this method to compute dense mat-vecs of hundreds of thousands of elements. The combined speedups obtained from the use of approximate methods and parallel processing represent an improvement of several orders of magnitude over exact mat-vecs on uniprocessors. We demonstrate that our parallel formulation incurs minimal parallel processing overhead and scales up to a large number of processors. We study the impact of varying the accuracy of the approximate mat-vec on overall time and on parallel efficiency. Experimental results are presented for 256 processor Cray T3D and Thinking Machines CM5 parallel computers. We have achieved computation rates in excess of 5 GFLOPS on the T3D.