Automatically tuning sparse matrix-vector multiplication for GPU architectures

Authors:
Alexander Monakov;Anton Lokhmotov;Arutyun Avetisyan
Affiliations:
Institute for System Programming of RAS, Moscow, Russian Federation;Department of Computing, Imperial College London, London, United Kingdom;Institute for System Programming of RAS, Moscow, Russian Federation
Venue:
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Year:
2010

Citing 5
Cited 11

Automatic performance tuning of sparse matrix kernels

Automatic performance tuning of sparse matrix kernels
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Concurrent number cruncher: an efficient sparse linear solver on the GPU

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Iterative sparse Matrix-Vector multiplication for integer factorization on GPUs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A memory accelerator with gather functions for bandwidth-bound irregular applications

Proceedings of the first workshop on Irregular applications: architectures and algorithm
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs

Proceedings of the 26th ACM international conference on Supercomputing
Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach

Parallel Computing
GPU-accelerated preconditioned iterative linear solvers

The Journal of Supercomputing
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
yaSpMV: yet another SpMV framework on GPUs

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiled multithreaded data paths on FPGAs for dynamic workloads

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.