Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication
IEEE Transactions on Parallel and Distributed Systems
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
When cache blocking of sparse matrix vector multiply works and why
Applicable Algebra in Engineering, Communication and Computing
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
A Hilbert-order multiplication scheme for unstructured sparse matrices
International Journal of Parallel, Emergent and Distributed Systems
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods
SIAM Journal on Scientific Computing
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
Fast Recommendation on Bibliographic Networks
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Hi-index | 0.00 |
In earlier work, we presented a one-dimensional cache-oblivious sparse matrix-vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. In this paper, we present our research in this direction, extending the one-dimensional method for cache-oblivious SpMV multiplication to two dimensions, while still allowing only row and column permutations on the sparse input matrix. This extension requires a generalisation of the compressed row storage data structure to a block-based data structure, for which several variants are investigated. Experiments performed on three different architectures show further improvements of the two-dimensional method compared to the one-dimensional method, especially in those cases where the one-dimensional method already provided significant gains. The largest gain obtained by our new reordering is over a factor of 3 in SpMV speed, compared to the natural matrix ordering.