Exploiting dense substructures for fast sparse matrix vector multiplication

Authors:
Manu Shantharam;Anirban Chatterjee;Padma Raghavan
Affiliations:
Department of Computer Science and Engineering The PennsylvaniaState University University Park, PA, USA;Department of Computer Science and Engineering The PennsylvaniaState University University Park, PA, USA;Department of Computer Science and Engineering The PennsylvaniaState University University Park, PA, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 23
Cited 0

The evolution of the minimum degree ordering algorithm

SIAM Review
Parallel algorithms for sparse linear systems

SIAM Review
Compressed graphs and the minimum degree algorithm

SIAM Journal on Scientific Computing
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel Processing for Scientific Computing (Software, Environments and Tools)

Parallel Processing for Scientific Computing (Software, Environments and Tools)
Understanding the Performance of Sparse Matrix-Vector Multiplication

PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods

SIAM Journal on Scientific Computing
Optimizing Sparse Data Structures for Matrix-vector Multiply

International Journal of High Performance Computing Applications
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The execution time of many scientific computing applications is dominated by the time spent in performing sparse matrix vector multiplication (SMV; y â聠聬 A 脗· x). We consider improving the performance of SMV on multicores by exploiting the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. First, we identify indistinguishable vertices, i.e., vertices with the same adjacency structure, in a graph representation of the sparse matrix (A) and group them into a supernode. Next, we identify effectively dense blocks within the matrix by grouping rows and columns in each supernode. Finally, by using a suitable data structure for this representation of the matrix, we reduce the number of load operations during SMV while exactly preserving the original sparsity structure of A. In addition, we use ordering techniques to enhance locality in accesses to the vector, x, to yield an SMV kernel that exploits the effectively dense substructures in the matrix. We evaluate our scheme on Intel Nehalem and AMD Shanghai processors. We observe that for larger matrices on the Intel Nehalem processor, our method improves performance on average by 37.35% compared with the traditional compressed sparse row scheme (a blocked compressed form improves performance on average by 30.27%). Benefits of our new format are similar for the AMD processor. More importantly, if we pick for each matrix the best among our method and the blocked compressed scheme, the average performance improvements increase to 40.85%. Additional results indicate that the best performing scheme varies depending on the matrix and the system. We therefore propose an effective density measure that could be used for method selection, thus adding to the variety of options for an auto-tuned optimized SMV kernel that can exploit sparse matrix properties and hardware attributes for high performance.