High-Performance matrix-vector multiplication on the GPU

Authors:
Hans Henrik Brandenborg Sørensen
Affiliations:
Informatics and Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark
Venue:
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Year:
2011

Citing 2
Cited 0

LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.