Data structures for compact sparse matrices representation
Advances in Engineering Software
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
ACM Transactions on Mathematical Software (TOMS)
Numerical Recipes: The Art of Scientific Computing with IBM PC or Macintosh
Numerical Recipes: The Art of Scientific Computing with IBM PC or Macintosh
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
A Blocked All-Pairs Shortest-Path Algorithm
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
Data distribution schemes of sparse arrays on distributed memory multicomputers
The Journal of Supercomputing
Hi-index | 0.01 |
Several fast sequential algorithms have been proposed in the past to multiply sparse matrices. These algorithms do not explicitly address the impact of caching on performance. We show that a rather simple sequential cache-efficient algorithm provides significantly better performance than existing algorithms for sparse matrix multiplication. We then describe a multithreaded implementation of this simple algorithm and show that its performance scales well with the number of threads and CPUs. For 10% sparse, 500 脳 500 matrices, the multithreaded version running on 4--CPU systems provides more than a 41.1-fold speed increase over the well-known BLAS routine and a 14.6 fold and 44.6-fold speed increase over two other recent techniques for fast sparse matrix multiplication, both of which are relatively difficult to parallelize efficiently.