When cache blocking of sparse matrix vector multiply works and why

Authors:
Rajesh Nishtala;Richard W. Vuduc;James W. Demmel;Katherine A. Yelick
Affiliations:
University of California at Berkeley, Computer Science Division, 575 Soda Hall, 94720, Berkeley, CA, USA;University of California at Berkeley, Computer Science Division, 575 Soda Hall, 94720, Berkeley, CA, USA;University of California at Berkeley, Computer Science Division, 575 Soda Hall, 94720, Berkeley, CA, USA;University of California at Berkeley, Computer Science Division, 575 Soda Hall, 94720, Berkeley, CA, USA
Venue:
Applicable Algebra in Engineering, Communication and Computing
Year:
2007

Citing 0
Cited 8

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel Computing
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing
CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
"Wide or tall" and "sparse matrix dense matrix" multiplications

Proceedings of the 19th High Performance Computing Symposia
Two-dimensional cache-oblivious sparse matrix-vector multiplication

Parallel Computing
Fast iterative graph computation with block updates

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present new performance models and more compact data structures for cache blocking when applied to sparse matrix-vector multiply (SpM × V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimum block sizes. In addition, we determine criteria that predict when cache blocking improves performance. We conclude with architectural suggestions that would make memory systems execute SpM × V faster.