A quasi-minimal residual variant of the Bi-CGSTAB algorithm for nonsymmetric systems
SIAM Journal on Scientific Computing
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Efficient histogram generation using scattering on GPUs
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Molecular dynamics simulations on commodity GPUs with CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Accelerate video decoding with generic GPU
IEEE Transactions on Circuits and Systems for Video Technology
Concurrent number cruncher: an efficient sparse linear solver on the GPU
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
The general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We implemented a GPU-accelerated PQMRCGSTAB algorithm on NVIDIA Tesla C870. Three optimization methods are used to improve the performance of the GPU-accelerated PQMRCGSTAB algorithm: reorganizing data by data packing and matrix mending to obtain higher memory bandwidth; using texture memory instead of shared memory; exploiting the kernel mergence. The experimental results show that the GPU-accelerated PQMRCGSTAB algorithm achieves the peak performance of 17.7 GFLOPS on C870. Compared with a MPI version PQMRCGSTAB algorithm executed on an Intel Xeon quad-core CPU, GPU-accelerated PQMRCGSTAB can reach the speedup of five for a 640Kx640K matrix.