A bridging model for parallel computation
Communications of the ACM
An efficient nonsymmetric Lanczos method on parallel vector computers
Journal of Computational and Applied Mathematics
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices
SIAM Journal on Scientific Computing
An implementation of the QMR method based on coupled two-term recurrences
SIAM Journal on Scientific Computing
Parallel iterative solution of sparse linear systems on a transputer network
Parallel computation
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Parallel Ocean Flow Computations on a Regular and on an Irregular Grid
HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
The Improved Quasi-minimal Residual Method on Massively Distributed Memory Computers
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Parallel iterative solution methods for linear finite element computations on the Cray T3D
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
A Parallel Version of the Quasi-Minimal Residual Method, Based on Coupled Two-Term Recurrences
PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
Solving Sparse Least Squares Problems on Massively Distributed Memory Computers
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
Benchmarking the CLI for I/O-Intensive Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Hi-index | 0.00 |
For the solutions of unsymmetric linear systems of equations, we have proposed an improved version of the quasi-minimal residual (IQMR) method [21] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a couple two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. In this paper, we use the Bulk Synchronous Parallel (BSP) model to design a fully efficient, scalable and portable parallel IQMR algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec GC/PowerPlus, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the IQMR method using only a few system dependent parameters based on a simple and accurate cost modeling. The theoretical performance prediction are compared with measured timing results of a numerical application from ocean flow simulation.