Accelerating PQMRCGSTAB algorithm on GPU

  • Authors:
  • Canqun Yang;Zhen Ge;Juan Chen;Feng Wang;Qiang Wu

  • Affiliations:
  • School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China;School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China, Changsha, China

  • Venue:
  • Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We implemented a GPU-accelerated PQMRCGSTAB algorithm on NVIDIA Tesla C870. Three optimization methods are used to improve the performance of the GPU-accelerated PQMRCGSTAB algorithm: reorganizing data by data packing and matrix mending to obtain higher memory bandwidth; using texture memory instead of shared memory; exploiting the kernel mergence. The experimental results show that the GPU-accelerated PQMRCGSTAB algorithm achieves the peak performance of 17.7 GFLOPS on C870. Compared with a MPI version PQMRCGSTAB algorithm executed on an Intel Xeon quad-core CPU, GPU-accelerated PQMRCGSTAB can reach the speedup of five for a 640Kx640K matrix.