Accelerating PQMRCGSTAB algorithm on GPU

Authors:
Canqun Yang;Zhen Ge;Juan Chen;Feng Wang;Qiang Wu
Affiliations:
School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China
Venue:
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Year:
2009

Citing 9
Cited 1

A quasi-minimal residual variant of the Bi-CGSTAB algorithm for nonsymmetric systems

SIAM Journal on Scientific Computing
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Efficient histogram generation using scattering on GPUs

Proceedings of the 2007 symposium on Interactive 3D graphics and games
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
The GPU Enters Computing's Mainstream

Computer
Molecular dynamics simulations on commodity GPUs with CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
Accelerate video decoding with generic GPU

IEEE Transactions on Circuits and Systems for Video Technology
Concurrent number cruncher: an efficient sparse linear solver on the GPU

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We implemented a GPU-accelerated PQMRCGSTAB algorithm on NVIDIA Tesla C870. Three optimization methods are used to improve the performance of the GPU-accelerated PQMRCGSTAB algorithm: reorganizing data by data packing and matrix mending to obtain higher memory bandwidth; using texture memory instead of shared memory; exploiting the kernel mergence. The experimental results show that the GPU-accelerated PQMRCGSTAB algorithm achieves the peak performance of 17.7 GFLOPS on C870. Compared with a MPI version PQMRCGSTAB algorithm executed on an Intel Xeon quad-core CPU, GPU-accelerated PQMRCGSTAB can reach the speedup of five for a 640Kx640K matrix.