Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs
Parallel Computing
CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
"Wide or tall" and "sparse matrix dense matrix" multiplications
Proceedings of the 19th High Performance Computing Symposia
Two-dimensional cache-oblivious sparse matrix-vector multiplication
Parallel Computing
Fast iterative graph computation with block updates
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We present new performance models and more compact data structures for cache blocking when applied to sparse matrix-vector multiply (SpM × V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimum block sizes. In addition, we determine criteria that predict when cache blocking improves performance. We conclude with architectural suggestions that would make memory systems execute SpM × V faster.