Understanding the Performance of Sparse Matrix-Vector Multiplication

Authors:
Georgios Goumas;Kornilios Kourtis;Nikos Anastopoulos;Vasileios Karakasis;Nectarios Koziris
Affiliations:
-;-;-;-;-
Venue:
PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Year:
2008

Citing 0
Cited 11

Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
Parallel MLEM on Multicore Architectures

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Haptic rendering of deformable objects using a multiple FPGA parallel computing architecture

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Exploiting dense substructures for fast sparse matrix vector multiplication

International Journal of High Performance Computing Applications
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system

The Journal of Supercomputing
SimPL: an effective placement algorithm

Proceedings of the International Conference on Computer-Aided Design
Efficient matrix-encoded grammars and low latency parallelization strategies for CYK

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs

Parallel Computing
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we revisit the performance issues of the widely used sparse matrix-vector multiplication kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided and thus unsuccessful attempts for optimization. In order to gain an insight on the details of performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. Based on our experiments we extractuseful conclusions that can serve as guidelines for the subsequent optimization process of the kernel.