Optimal sparse matrix dense vector multiplication in the I/O-model

  • Authors:
  • Michael A. Bender;Gerth Stølting Brodal;Rolf Fagerberg;Riko Jacob;Elias Vicari

  • Affiliations:
  • Stony Brook University, Stony Brook, NY;University of Aarhus, Aarhus, Denmark;University of Southern Denmark, Odense M, Denmark;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland

  • Venue:
  • Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O-model. The task of SpMV is to compute y := Ax, where A is a sparse N x N matrix and x and y are vectors. Here, sparsity is expressed by the parameter k that states that A has a total of at most kN nonzeros, i.e., an average number of k nonzeros per column. The extreme choices for parameter k are well studied special cases, namely for k=1 permuting and for k=N dense matrix-vector multiplication. We study the worst-case complexity of this computational task, i.e., what is the best possible upper bound on the number of I/Os depending on k and N only. We determine this complexity up to a constant factor for large ranges of the parameters. By our arguments, we find that most matrices with kN nonzeros require this number of I/Os, even if the program may depend on the structure of the matrix. The model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung. We study two variants of the problem, depending on the memory layout of A. If A is stored in column major layout, SpMV has I/O complexity Θ(min{kNB(1+logM/BNmax{M,k}), kN}) for k ≤ N1-ε and any constant 1 ε 0. If the algorithm can choose the memory layout, the I/O complexity of SpMV is Θ(min{kNB(1+logM/BNkM), kN]) for k ≤ 3√N. In the cache oblivious setting with tall cache assumption M ≥ B1+ε, the I/O complexity is Ο(kNB(1+logM/B Nk)) for A in column major layout.