Direct methods for sparse matrices
Direct methods for sparse matrices
Introduction to algorithms
Sparse matrices in matlab: design and implementation
SIAM Journal on Matrix Analysis and Applications
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming parallel algorithms
Communications of the ACM
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Geometric Mesh Partitioning: Implementation and Experiments
SIAM Journal on Scientific Computing
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The C++ Programming Language
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Reducing the bandwidth of sparse symmetric matrices
ACM '69 Proceedings of the 1969 24th national conference
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Stanford WebBase components and applications
ACM Transactions on Internet Technology (TOIT)
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
Proceedings of the 2006 workshop on Memory system performance and correctness
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
When cache blocking of sparse matrix vector multiply works and why
Applicable Algebra in Engineering, Communication and Computing
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
Converting to and from Dilated Integers
IEEE Transactions on Computers
Optimizing sparse matrix-vector multiplication using index and value compression
Proceedings of the 5th conference on Computing frontiers
Reducers and other Cilk++ hyperobjects
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Sparse matrices in Matlab*P: design and implementation
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Exact sparse matrix-vector multiplication on GPU's and multicore architectures
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Two-dimensional cache-oblivious sparse matrix-vector multiplication
Parallel Computing
The Combinatorial BLAS: design, implementation, and applications
International Journal of High Performance Computing Applications
An object-oriented bulk synchronous parallel library for multicore programming
Concurrency and Computation: Practice & Experience
Sparse matrix-vector multiply on the HICAMP architecture
Proceedings of the 26th ACM international conference on Supercomputing
An improved sparse matrix-vector multiply based on recursive sparse blocks layout
LSSC'11 Proceedings of the 8th international conference on Large-Scale Scientific Computing
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
yaSpMV: yet another SpMV framework on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.