The input/output complexity of sorting and related problems
Communications of the ACM
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
SIAM Journal on Computing
External memory algorithms and data structures
External memory algorithms
On showing lower bounds for external-memory computational geometry problems
External memory algorithms
A survey of out-of-core algorithms in numerical linear algebra
External memory algorithms
PSBLAS: a library for parallel linear algebra computation on sparse matrices
ACM Transactions on Mathematical Software (TOMS)
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
Multi-linear formulas for permanent and determinant are of super-polynomial size
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Automatic performance tuning of sparse matrix kernels
Automatic performance tuning of sparse matrix kernels
Cache-aware and cache-oblivious adaptive sorting
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Communication-optimal parallel and sequential Cholesky decomposition: extended abstract
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Evaluating non-square sparse bilinear forms on multiple vector pairs in the I/O-model
MFCS'10 Proceedings of the 35th international conference on Mathematical foundations of computer science
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
The i/o complexity of sparse matrix dense matrix multiplication
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Managing data-movement for effective shared-memory parallelization of out-of-core sparse solvers
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.01 |
We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O-model. The task of SpMV is to compute y := Ax, where A is a sparse N x N matrix and x and y are vectors. Here, sparsity is expressed by the parameter k that states that A has a total of at most kN nonzeros, i.e., an average number of k nonzeros per column. The extreme choices for parameter k are well studied special cases, namely for k=1 permuting and for k=N dense matrix-vector multiplication. We study the worst-case complexity of this computational task, i.e., what is the best possible upper bound on the number of I/Os depending on k and N only. We determine this complexity up to a constant factor for large ranges of the parameters. By our arguments, we find that most matrices with kN nonzeros require this number of I/Os, even if the program may depend on the structure of the matrix. The model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung. We study two variants of the problem, depending on the memory layout of A. If A is stored in column major layout, SpMV has I/O complexity Θ(min{kNB(1+logM/BNmax{M,k}), kN}) for k ≤ N1-ε and any constant 1 ε 0. If the algorithm can choose the memory layout, the I/O complexity of SpMV is Θ(min{kNB(1+logM/BNkM), kN]) for k ≤ 3√N. In the cache oblivious setting with tall cache assumption M ≥ B1+ε, the I/O complexity is Ο(kNB(1+logM/B Nk)) for A in column major layout.