Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

Authors:
Juan C. Pichel;Francisco F. Rivera;Marcos Fernández;Aurelio Rodríguez
Affiliations:
Electronics and Computer Science Dpt., Universidade de Santiago de Compostela, Spain;Electronics and Computer Science Dpt., Universidade de Santiago de Compostela, Spain;Galicia Supercomputing Center (CESGA), Santiago de Compostela, Spain;Galicia Supercomputing Center (CESGA), Santiago de Compostela, Spain
Venue:
Microprocessors & Microsystems
Year:
2012

Citing 17
Cited 2

Block algorithms for sparse matrix computations on high performance workstations

ICS '96 Proceedings of the 10th international conference on Supercomputing
An Approximate Minimum Degree Ordering Algorithm

SIAM Journal on Matrix Analysis and Applications
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

SIAM Review
Sparse Tiling for Stationary Iterative Methods

International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Performance Optimization and Modeling of Blocked Sparse Kernels

International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Reordering Algorithms for Increasing Locality on Multicore Processors

HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Performance optimization of irregular codes based on the combination of reordering and blocking techniques

Parallel Computing
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor

Journal of Parallel and Distributed Computing
yaSpMV: yet another SpMV framework on GPUs

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well-known that reordering techniques applied to sparse matrices are common strategies to improve the performance of sparse matrix operations, and particularly, the sparse matrix vector multiplication (SpMV) on CPUs. In this paper, we have evaluated some of the most successful reordering techniques on two different GPUs. In addition, in our study a number of sparse matrix storage formats were considered. Executions for both single and double precision arithmetics were also performed. We have found that SpMV is very sensitive to the application of reordering techniques on GPUs. In particular, several characteristics of the reordered matrices that have a big impact on the SpMV performance have been detected. In most of the cases, reordered matrices outperform the original ones, showing noticeable speedups up to 2.6x. We have also observed that there is no one storage format preferred over the others.