A performance model for krylov subspace methods on mesh-based parallel computers

Authors:
E. de Sturler
Affiliations:
-
Venue:
Parallel Computing
Year:
1996

Citing 8
Cited 3

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
On vectorizing incomplete factorization and SSOR preconditioners

SIAM Journal on Scientific and Statistical Computing - Telecommunication Programs at U.S. Universities
CGS, a fast Lanczos-type solver for nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
s-step iterative methods for symmetric linear systems

Journal of Computational and Applied Mathematics
BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers

Mining and visualizing recommendation spaces for elliptic PDEs with continuous attributes

ACM Transactions on Mathematical Software (TOMS) - Special issue in honor of John Rice's 65th birthday
Mining and visualizing recommendation spaces for PDE solvers: the continuous attributes case

Computational science, mathematics and software
An improved parallel hybrid bi-conjugate gradient method suitable for distributed parallel computing

Journal of Computational and Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop a performance model for Krylov subspace methods implemented on distributed memory parallel computers for which the underlying communication network is a two-dimensional mesh. The model is based on the runtime of a single iteration or cycle of iterations (for methods like GMRES(m)), because the iteration count is problem dependent. Moreover, we intend to use the model only for parallel implementations that do not change the mathematical properties of the method (significantly). The main purpose of this model is a qualitative analysis of the performance; the model is not meant for very accurate predictions. We express the efficiency, speed-up, and runtime as functions of the number of processors scaled by the number of processors that gives the minimal runtime for the given problem size (P"m"a"x). This provides a natural way to analyze the performance characteristics for the range of the numbers of processors that can be used effectively. The approach is particularly interesting because it turns out that the performance is characterized completely by the sequential runtime and P"m"a"x. The efficiency as a function of the number of processors relative to P"m"a"x is independent of the problem size and parameters describing the machine and solution method. Analogous relations can be obtained for the speed-up and runtime. P"m"a"x itself, of course, depends on N and the other parameters, and a simple equation for P"m"a"x is given. The performance model is also used to evaluate the improvements in the performance if we reduce the communication as described in [7,9,8]. Although the scope of the performance model is limited by assumptions on the load balance and the processor grid, there are several obvious generalizations. One important and straightforward generalization is to higher dimensional meshes. We will discuss such generalizations at the end of this article.