Adaptive thread distributions for SpMV on a GPU

  • Authors:
  • Dahai Guo;William Gropp

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the Extreme Scaling Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a simple auto-tuning method to improve the performance of sparse matrix-vector multiply (SpMV) on a GPU. The sparse matrix, stored in CSR format, is sorted in increasing order of the number of nonzero elements per row and partitioned into several ranges. The number of GPU threads per row (TPR) is then assigned for different ranges of the matrix rows to balance the workload for the GPU threads. Tests show that the method provides good performance for most of the matrices tested, compared to the NVIDIA sparse package. The auto-tuning approach is easy to implement, the tuning process is fast, and it is not necessary to convert the matrices into different formats and try them one by one to determine the best format for the matrix, as in some other approaches for this problem.