Direct methods for sparse matrices
Direct methods for sparse matrices
Numerical Linear Algebra for High Performance Computers
Numerical Linear Algebra for High Performance Computers
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Sparse matrix vector multiplication techniques on the IBM 3090 VF
Parallel Computing
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
Fast sparse matrix-vector multiplication by exploiting variable block structure
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Performance improvement of sparse matrix vector product on vector machines
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
A space and time efficient algorithm for SimRank computation
World Wide Web
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The innovation of this work is a simple vectorizable algorithm for performing sparse matrix vector multiply in compressed sparse row (CSR) storage format. Unlike the vectorizable jagged diagonal format (JAD), this algorithm requires no data rearrangement and can be easily adapted to a sophisticated library framework such as PETSc. Numerical experiments on the Cray X1 show an order of magnitude improvement over the non-vectorized algorithm.