Exploiting compression opportunities to improve SpMxV performance on shared memory systems

Authors:
Kornilios Kourtis;Georgios Goumas;Nectarios Koziris
Affiliations:
National Technical University of Athens, Athens, Greece;National Technical University of Athens, Athens, Greece;National Technical University of Athens, Athens, Greece
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2010

Citing 27
Cited 1

Characterizing the behavior of sparse algorithms on caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication

IRREGULAR '96 Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems
Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
On Improving the Performance of Sparse Matrix-Vector Multiplication

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Industry Trends: Chip Makers Turn to Multicore Processors

Computer
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam

International Journal of High Performance Computing Applications
Exploring the cache design space for large scale CMPs

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
High Throughput Compression of Double-Precision Floating-Point Data

DCC '07 Proceedings of the 2007 Data Compression Conference
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
Performance evaluation of the sparse matrix-vector multiplication on modern architectures

The Journal of Supercomputing
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Streaming sparse matrix compression/decompression

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers

Efficient sparse matrix-vector multiplication on x86-based many-core processors

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared memory systems, due to the streaming nature of its data access pattern. To decrease memory contention and improve kernel performance we propose two compression schemes: CSR-DU, that targets the reduction of the matrix structural data by applying coarse-grained delta-encoding, and CSR-VI, that targets the reduction of the values using indirect indexing, applicable to matrices with a small number of unique values. Thorough experimental evaluation of the proposed methods and their combination, on two modern shared memory systems, demonstrated that they can significantly improve multithreaded SpMxV performance upon standard and state-of-the-art approaches.