Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices

  • Authors:
  • Affiliations:
  • Venue:
  • IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
  • Year:
  • 1998

Quantified Score

Hi-index 0.01

Visualization

Abstract

Several fast sequential algorithms have been proposed in the past to multiply sparse matrices. These algorithms do not explicitly address the impact of caching on performance. We show that a rather simple sequential cache-efficient algorithm provides significantly better performance than existing algorithms for sparse matrix multiplication. We then describe a multithreaded implementation of this simple algorithm and show that its performance scales well with the number of threads and CPUs. For 10% sparse, 500 脳 500 matrices, the multithreaded version running on 4--CPU systems provides more than a 41.1-fold speed increase over the well-known BLAS routine and a 14.6 fold and 44.6-fold speed increase over two other recent techniques for fast sparse matrix multiplication, both of which are relatively difficult to parallelize efficiently.