Characterizing the behavior of sparse algorithms on caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication
IRREGULAR '96 Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
On Improving the Performance of Sparse Matrix-Vector Multiplication
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam
International Journal of High Performance Computing Applications
Exploring the cache design space for large scale CMPs
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
High Throughput Compression of Double-Precision Floating-Point Data
DCC '07 Proceedings of the 2007 Data Compression Conference
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression
Proceedings of the 5th conference on Computing frontiers
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
Performance evaluation of the sparse matrix-vector multiplication on modern architectures
The Journal of Supercomputing
Fast sparse matrix-vector multiplication by exploiting variable block structure
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Streaming sparse matrix compression/decompression
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Efficient sparse matrix-vector multiplication on x86-based many-core processors
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared memory systems, due to the streaming nature of its data access pattern. To decrease memory contention and improve kernel performance we propose two compression schemes: CSR-DU, that targets the reduction of the matrix structural data by applying coarse-grained delta-encoding, and CSR-VI, that targets the reduction of the values using indirect indexing, applicable to matrices with a small number of unique values. Thorough experimental evaluation of the proposed methods and their combination, on two modern shared memory systems, demonstrated that they can significantly improve multithreaded SpMxV performance upon standard and state-of-the-art approaches.