Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam

Authors:
John Mellor-Crummey;John Garvin
Affiliations:
DEPARTMENT OF COMPUTER SCIENCE, RICE UNIVERSITY, HOUSTON, USA;DEPARTMENT OF COMPUTER SCIENCE, RICE UNIVERSITY, HOUSTON, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2004

Citing 11
Cited 15

Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
Data structures to vectorize CG algorithms for general sparsity patterns

BIT
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication

Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
An operation stacking framework for large ensemble computations

Proceedings of the 21st annual international conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
Performance evaluation of the sparse matrix-vector multiplication on modern architectures

The Journal of Supercomputing
Operation Stacking for Ensemble Computations With Variable Convergence

International Journal of High Performance Computing Applications
Sparse matrix-vector multiplication - final solution?

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Three: dimensional bursting simulation on two parallel systems

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Exploiting compression opportunities to improve SpMxV performance on shared memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
CSX: an extended compression format for spmv on shared memory systems

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Optimizing Sparse Data Structures for Matrix-vector Multiply

International Journal of High Performance Computing Applications
A new diagonal blocking format and model of cache behavior for sparse matrices

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Parallel simulation of three–dimensional bursting with MPI and OpenMP

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Applications of the streamed storage format for sparse matrix operations

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale scientific applications frequently compute sparse matrix-vector products in their computational core. For this reason, techniques for computing sparse matrix- vector products efficiently on modern architectures are important. In this paper we describe a strategy for improving the performance of sparse matrix-vector product computations using a loop transformation known as unrollandjam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We show that our strategy is effective for improving sparse matrix-vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3, and Itanium 2 processors. Our measurements show that for this class of sparse matrices, our strategy improves sparse matrix-vector product performance from a low of 41% on MIPS to well over a factor of 2 on Itanium.