GPU computing with Kaczmarz's and other iterative algorithms for linear systems

Authors:
Joseph M. Elble;Nikolaos V. Sahinidis;Panagiotis Vouzis
Affiliations:
University of Illinois Urbana-Champaign, Department of Industrial and Enterprise Systems Engineering, Urbana, IL 61801, United States;Carnegie Mellon University, Department of Chemical Engineering, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States;Carnegie Mellon University, Department of Chemical Engineering, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States
Venue:
Parallel Computing
Year:
2010

Citing 15
Cited 4

Domain decomposition for parallel row projection algorithms

Applied Numerical Mathematics - II on Domain decomposition; Guest Editor: W. Proskurowski
A block projection method for sparse matrices

SIAM Journal on Scientific and Statistical Computing - Special issue on iterative methods in numerical linear algebra
Row projection methods for large nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing - Special issue on iterative methods in numerical linear algebra
Component averaging: An efficient iterative parallel algorithm for large and sparse unstructured problems

Parallel Computing
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Component-Averaged Row Projections: A Robust, Block-Parallel Scheme for Sparse Linear Systems

SIAM Journal on Scientific Computing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
CGMN Revisited: Robust and Efficient Solution of Stiff Linear Systems Derived from Elliptic Partial Differential Equations

ACM Transactions on Mathematical Software (TOMS)
Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations

International Journal of Parallel, Emergent and Distributed Systems
Editorial: Special issue: General-purpose processing using graphics processing units

Journal of Parallel and Distributed Computing
A randomized solver for linear systems with exponential convergence

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Inconsistent signal feasibility problems: least-squares solutionsin a product space

IEEE Transactions on Signal Processing

GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method

Journal of Computational and Applied Mathematics
Parallel design for error-resilient entropy coding algorithm on GPU

Journal of Parallel and Distributed Computing
A generalized Block FSAI preconditioner for nonsymmetric linear systems

Journal of Computational and Applied Mathematics
Distributed and hardware accelerated computing for clinical medical imaging using proton computed tomography (pCT)

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarz's, Cimmino's, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradient-accelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method.