GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
On vectorizing incomplete factorization and SSOR preconditioners
SIAM Journal on Scientific and Statistical Computing - Telecommunication Programs at U.S. Universities
CGS, a fast Lanczos-type solver for nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
s-step iterative methods for symmetric linear systems
Journal of Computational and Applied Mathematics
SIAM Journal on Scientific and Statistical Computing
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Mining and visualizing recommendation spaces for elliptic PDEs with continuous attributes
ACM Transactions on Mathematical Software (TOMS) - Special issue in honor of John Rice's 65th birthday
Mining and visualizing recommendation spaces for PDE solvers: the continuous attributes case
Computational science, mathematics and software
An improved parallel hybrid bi-conjugate gradient method suitable for distributed parallel computing
Journal of Computational and Applied Mathematics
Hi-index | 0.00 |
We develop a performance model for Krylov subspace methods implemented on distributed memory parallel computers for which the underlying communication network is a two-dimensional mesh. The model is based on the runtime of a single iteration or cycle of iterations (for methods like GMRES(m)), because the iteration count is problem dependent. Moreover, we intend to use the model only for parallel implementations that do not change the mathematical properties of the method (significantly). The main purpose of this model is a qualitative analysis of the performance; the model is not meant for very accurate predictions. We express the efficiency, speed-up, and runtime as functions of the number of processors scaled by the number of processors that gives the minimal runtime for the given problem size (P"m"a"x). This provides a natural way to analyze the performance characteristics for the range of the numbers of processors that can be used effectively. The approach is particularly interesting because it turns out that the performance is characterized completely by the sequential runtime and P"m"a"x. The efficiency as a function of the number of processors relative to P"m"a"x is independent of the problem size and parameters describing the machine and solution method. Analogous relations can be obtained for the speed-up and runtime. P"m"a"x itself, of course, depends on N and the other parameters, and a simple equation for P"m"a"x is given. The performance model is also used to evaluate the improvements in the performance if we reduce the communication as described in [7,9,8]. Although the scope of the performance model is limited by assumptions on the load balance and the processor grid, there are several obvious generalizations. One important and straightforward generalization is to higher dimensional meshes. We will discuss such generalizations at the end of this article.