Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression
Proceedings of the 5th conference on Computing frontiers
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs
Parallel Computing
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Communications of the ACM
CSX: an extended compression format for spmv on shared memory systems
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A Sparse Matrix Personality for the Convey HC-1
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
HICAMP: architectural support for efficient concurrency-safe shared structured data access
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Rethinking network stack design with memory snapshots
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Hi-index | 0.00 |
Sparse matrix-vector multiply (SpMV) is a critical task in the inner loop of modern iterative linear system solvers and exhibits very little data reuse. This low reuse means that its performance is bounded by main-memory bandwidth. Moreover, the random patterns of indirection make it difficult to achieve this bound. We present sparse matrix storage formats based on deduplicated memory. These formats reduce memory traffic during SpMV and thus show significantly improved performance bounds: 90x better in the best case. Additionally, we introduce a matrix format that inherently exploits any amount of matrix symmetry and is at the same time fully compatible with non-symmetric matrix code. Because of this, our method can concurrently operate on a symmetric matrix without complicated work partitioning schemes and without any thread synchronization or locking. This approach takes advantage of growing processor caches, but incurs an instruction count overhead. It is feasible to overcome this issue by using specialized hardware as shown by the recently proposed Hierarchical Immutable Content-Addressable Memory Processor, or HICAMP architecture.