Performance evaluation of the sparse matrix-vector multiplication on modern architectures

Authors:
Georgios Goumas;Kornilios Kourtis;Nikos Anastopoulos;Vasileios Karakasis;Nectarios Koziris
Affiliations:
Computing Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Greece 15780;Computing Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Greece 15780;Computing Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Greece 15780;Computing Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Greece 15780;Computing Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Greece 15780
Venue:
The Journal of Supercomputing
Year:
2009

Citing 20
Cited 6

Data structures to vectorize CG algorithms for general sparsity patterns

BIT
A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Characterizing the behavior of sparse algorithms on caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication

IRREGULAR '96 Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication
On Improving the Performance of Sparse Matrix-Vector Multiplication

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam

International Journal of High Performance Computing Applications
Performance optimization of irregular codes based on the combination of reordering and blocking techniques

Parallel Computing
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
Exploring the performance limits of simultaneous multithreading for memory intensive applications

The Journal of Supercomputing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Exploiting compression opportunities to improve SpMxV performance on shared memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
CSX: an extended compression format for spmv on shared memory systems

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Adapt or become extinct!: the case for a unified framework for deployment-time optimization (position paper)

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Using state-of-the-art sparse matrix optimizations for accelerating the performance of multiphysics simulations

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Efficient sparse matrix-vector multiplication on x86-based many-core processors

Proceedings of the 27th international ACM conference on International conference on supercomputing
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture's specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.