Adjacency-based data reordering algorithm for acceleration of finite element computations
Scientific Programming
Exploiting dense substructures for fast sparse matrix vector multiplication
International Journal of High Performance Computing Applications
Two-dimensional cache-oblivious sparse matrix-vector multiplication
Parallel Computing
An object-oriented bulk synchronous parallel library for multicore programming
Concurrency and Computation: Practice & Experience
Cache-conscious scheduling of streaming applications
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Partitioning Hypergraphs in Scientific Computing Applications through Vertex Separators on Graphs
SIAM Journal on Scientific Computing
Fast Recommendation on Bibliographic Networks
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Hi-index | 0.00 |
In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix-vector multiplication. Matrices are assumed to be stored in row-major format, by means of the compressed row storage (CRS) or its variants incremental CRS and zig-zag CRS. The zig-zag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated block-diagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a run-time cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix-vector multiplication within a $k$-way set-associative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in one-dimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.