Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU

Authors:
Jiaquan Gao;Ronghua Liang;Jun Wang
Affiliations:
-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2014

Citing 11
Cited 0

A brief review of the ITPACK project

Journal of Computational and Applied Mathematics - Special issue on iterative methods for the solution of linear systems
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Concurrent number cruncher: a GPU implementation of a general sparse linear solver

International Journal of Parallel, Emergent and Distributed Systems
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing
A new approach for sparse matrix vector product on NVIDIA GPUs

Concurrency and Computation: Practice & Experience
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)
Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs

ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Parallel preconditioned conjugate gradient algorithm on GPU

Journal of Computational and Applied Mathematics
GPU-based parallel algorithms for sparse nonlinear systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study, we discover the parallelism of the forward/backward substitutions (FBS) for two cases and thus propose an efficient preconditioned conjugate gradient algorithm with the modified incomplete Cholesky preconditioner on the GPU (GPUMICPCGA). For our proposed GPUMICPCGA, the following are distinct characteristics: (1) the vector operations are optimized by grouping several vector operations into single kernels, (2) a new kernel of inner product and a new kernel of the sparse matrix-vector multiplication with high optimization are presented, and (3) an efficient parallel implementation of FBS on the GPU (GPUFBS) for two cases are suggested. Numerical results show that our proposed kernels outperform the corresponding ones presented in CUBLAS or CUSPARSE, and GPUFBS is almost 3 times faster than the implementation of FBS using the CUSPARSE library. Furthermore, GPUMICPCGA has better behavior than its counterpart implemented by the CUBLAS and CUSPARSE libraries.