Parallel Performance Analysis of the Improved Quasi-Minimal Residual Method on Bulk Synchronous Parallel Architectures

  • Authors:
  • Tianruo Yang;Hai-Xiang Linh

  • Affiliations:
  • Department of Computer and Information Science, Linköping University, S-581 83, Linköping, Sweden;Department of Technical Mathematics and Computer Science, TU Delft, Mekelweg 4, 2628 CD, Delft, The Netherlands

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the solutions of unsymmetric linear systems of equations, we have proposed an improved version of the quasi-minimal residual (IQMR) method [21] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a couple two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. In this paper, we use the Bulk Synchronous Parallel (BSP) model to design a fully efficient, scalable and portable parallel IQMR algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec GC/PowerPlus, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the IQMR method using only a few system dependent parameters based on a simple and accurate cost modeling. The theoretical performance prediction are compared with measured timing results of a numerical application from ocean flow simulation.