CSX: an extended compression format for spmv on shared memory systems
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Microprocessors & Microsystems
Efficient sparse matrix-vector multiplication on x86-based many-core processors
Proceedings of the 27th international ACM conference on International conference on supercomputing
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Sparse Matrix-Vector multiplication (SpMV) is a very challenging computationalkernel, since its performance depends greatly on both the input matrix and theunderlying architecture. The main problem of SpMV is its high demands on memorybandwidth, which cannot yet be abudantly offered from modern commodityarchitectures. One of the most promising optimization techniques for SpMV isblocking, which can reduce the indexing structures for storing a sparse matrix,and therefore alleviate the pressure to the memory subsystem. In this paper, westudy and evaluate a number of representative blocking storage formats on a setof modern microarchitectures that can provide up to 64 hardware contexts. Thepurpose of this paper is to present the merits and drawbacks of each method inrelation to the underlying microarchitecture and to provide a consistentoverview of the most promising blocking storage methods for sparse matrices thathave been presented in the literature.