A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs

Authors:
Ping Guo;He Huang;Qichang Chen;Liqiang Wang;En-Jui Lee;Po Chen
Affiliations:
University of Wyoming;University of Wyoming;University of Wyoming;University of Wyoming;University of Wyoming;University of Wyoming
Venue:
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Year:
2011

Citing 8
Cited 0

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

Parallel Computing
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Auto-tuning 3-D FFT library for CUDA GPUs

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

ICCIS '10 Proceedings of the 2010 International Conference on Computational and Information Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of SpMV on GPUs. This paper makes the following contributions: (1) Propose an empirical CUDA performance model to predict the execution time of SpMV CUDA kernels. (2) Design and implement a model-driven partitioning framework to predict how to partition the target sparse matrix into one or more partitions and transform each partition into appropriate storage format, which is based on the fact that the different storage formats of sparse matrix can significantly affect the performance of SpMV. (3) Integrate the model-driven partitioning with our previous auto-tuning framework to automatically adjust CUDA-specific parameters to optimize performance on specific GPUs. Compared to the NVIDIA's existing implementations, our approach shows a substantial performance improvement. It has 222%, 197%, and 33% performance improvement on the average for CSR vector kernel, ELL kernel and HYB kernel, respectively.