Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

  • Authors:
  • Michael A. Bender;Gerth Stølting Brodal;Rolf Fagerberg;Riko Jacob;Elias Vicari

  • Affiliations:
  • Stony Brook University, Department of Computer Science, 11794-4400, Stony Brook, NY, USA;Aarhus University, MADALGO, Department of Computer Science, Aarhus, Denmark;University of Southern Denmark, Department of Mathematics and Computer Science, Odense, Denmark;Technische Universität München, Department of Computer Science, Munich, Germany;ETH Zurich, Institute of Theoretical Computer Science, 8092, Zurich, Switzerland

  • Venue:
  • Theory of Computing Systems - Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k. We investigate what is the external worst-case complexity, i.e., the best possible upper bound on the number of I/Os, as a function of k, N and the parameters M (memory size) and B (track size) of the I/O-model. We determine this complexity up to a constant factor for all meaningful choices of these parameters, as long as k≤N 1−ε , where ε depends on the problem variant. Our model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung. We study variants of the problem, differing in the memory layout of A. If A is stored in column major layout, we prove that SpMV has I/O complexity $\Theta(\min\{\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\},\,kN\})$for k≤N 1−ε and any constant 0εk≤N/2. In the cache oblivious setting we prove that with tall cache assumption M≥B 1+ε , the I/O complexity is $\mathcal {O}({\frac{kN}{B}\max\{1,\log_{M/B}\frac{N}{\max\{k,M\}}\}})$for A in column major layout.