An experimental study of methods for parallel preconditioned Krylov methods

Authors:
D. Baxter;J. Saltz;M. Schultz;S. Eisenstat;K. Crowley
Affiliations:
Department of Computer Science, Yale University, New Haven, CT;Department of Computer Science, Yale University, New Haven, CT;Department of Computer Science, Yale University, New Haven, CT;Department of Computer Science, Yale University, New Haven, CT;Department of Computer Science, Yale University, New Haven, CT
Venue:
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Year:
1989

Citing 4
Cited 6

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
Fundamentals of Computer Alori

Fundamentals of Computer Alori
Conjugate gradient methods for partial differential equations.

Conjugate gradient methods for partial differential equations.
Iterative methods for large, sparse, nonsymmetric systems of linear equations

Iterative methods for large, sparse, nonsymmetric systems of linear equations

What have we learnt from using real parallel machines to solve real problems?

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Run-time parallelization and scheduling of loops

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks

IEEE Transactions on Parallel and Distributed Systems
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Distributed Memory Compiler Design For Sparse Problems

IEEE Transactions on Computers

Quantified Score

Hi-index	0.01

Visualization

Abstract

High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.