Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Performance Optimization and Modeling of Blocked Sparse Kernels
International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hi-index | 0.00 |
The CUDA model for GPUs presents the programmer with a plethora of different programming options. These includes different memory types, different memory access methods, and different data types. Identifying which options to use and when is a non-trivial exercise. This paper explores the effect of these different options on the performance of a routine that evaluates sparse matrix vector products. A process for analysing performance and selecting the subset of implementations that perform best is proposed. The potential for mapping sparse matrix attributes to optimal CUDA sparse matrix vector product implementation is discussed.