Matrix computations (3rd ed.)
Preconditioning techniques for large linear systems: a survey
Journal of Computational Physics
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
CSE '08 Proceedings of the 2008 11th IEEE International Conference on Computational Science and Engineering
Neural Network Implementation Using CUDA and OpenMP
DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Journal of Parallel and Distributed Computing
Accelerating PQMRCGSTAB algorithm on GPU
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Hi-index | 0.00 |
Forward and back substitution algorithms are widely used for solving linear systems of equations after performing LU decomposition on the coefficient matrix. They are also essential in the implementation of high performance preconditioners which improve the convergence properties of the various iterative methods. In this paper, we describe an efficient approach to implementing forward and back substitution algorithms on a GPU and provide the implementation details of these algorithms on a Modified Incomplete Cholesky Preconditioner for the Conjugate Gradient (CG) algorithm. The resulting forward and back substitution algorithms are then used on a Modified Incomplete Cholesky Preconditioned Conjugate Gradient method to solve the sparse, symmetric, positive definite and linear systems of equations arising from the discretization of three dimensional finite difference ground-water flow models. By utilizing multiple threads, the proposed method yields speedups up to 60 times on GeForce GTX 280 compared to CPU implementation and up to 4.8 times speedup compared to cuSPARSE library function optimized for GPU by NVIDIA.