Adaptive thread distributions for SpMV on a GPU

Authors:
Dahai Guo;William Gropp
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the Extreme Scaling Workshop
Year:
2012

Citing 1
Cited 0

Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a simple auto-tuning method to improve the performance of sparse matrix-vector multiply (SpMV) on a GPU. The sparse matrix, stored in CSR format, is sorted in increasing order of the number of nonzero elements per row and partitioned into several ranges. The number of GPU threads per row (TPR) is then assigned for different ranges of the matrix rows to balance the workload for the GPU threads. Tests show that the method provides good performance for most of the matrices tested, compared to the NVIDIA sparse package. The auto-tuning approach is easy to implement, the tuning process is fast, and it is not necessary to convert the matrices into different formats and try them one by one to determine the best format for the matrix, as in some other approaches for this problem.