Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

Authors:
Alexander Monakov;Arutyun Avetisyan
Affiliations:
Institute for System Programming of RAS, Moscow, Russia;Institute for System Programming of RAS, Moscow, Russia
Venue:
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Year:
2009

Citing 3
Cited 3

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Concurrent number cruncher: an efficient sparse linear solver on the GPU

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Automatically tuning sparse matrix-vector multiplication for GPU architectures

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs

Parallel Computing
GPU-accelerated preconditioned iterative linear solvers

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs. We outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks. In comparison with previously published implementation, our implementation is faster on matrices having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.