Optimizing symmetric dense matrix-vector multiplication on GPUs

Authors:
Rajib Nath;Stanimire Tomov;Tingxing "Tim" Dong;Jack Dongarra
Affiliations:
University of California, San Diego;University of Tennessee, Knoxville;University of Tennessee, Knoxville;University of Tennessee, Knoxville
Venue:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2011

Citing 8
Cited 1

LAPACK's user's guide

LAPACK's user's guide
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Automated empirical optimization of high performance floating point kernels

Automated empirical optimization of high performance floating point kernels
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Note on Auto-tuning GEMM for GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
An Improved Magma Gemm For Fermi Graphics Processing Units

International Journal of High Performance Computing Applications
Accelerating GPU kernels for dense linear algebra

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A fully empirical autotuned dense QR factorization for multicore architectures

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II

Systematic approach in optimizing numerical memory-bound kernels on GPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector product (SYMV) for dense linear algebra. Optimizing the SYMV kernel is important because it forms the basis of fundamental algorithms such as linear solvers and eigenvalue solvers on symmetric matrices. In this work, we present a new algorithm for optimizing the SYMV kernel on GPUs. Our optimized SYMV in single precision brings up to a 7x speed up compared to the (latest) CUBLAS 4.0 NVIDIA library on the GTX 280 GPU. Our SYMV kernel tuned for Fermi C2050 is 4.5x faster than CUBLAS 4.0 in single precision and 2x faster than CUBLAS 4.0 in double precision. Moreover, the techniques used and described in the paper are general enough to be of interest for developing high-performance GPU kernels beyond the particular case of SYMV.