Exploiting dense substructures for fast sparse matrix vector multiplication

  • Authors:
  • Manu Shantharam;Anirban Chatterjee;Padma Raghavan

  • Affiliations:
  • Department of Computer Science and Engineering The PennsylvaniaState University University Park, PA, USA;Department of Computer Science and Engineering The PennsylvaniaState University University Park, PA, USA;Department of Computer Science and Engineering The PennsylvaniaState University University Park, PA, USA

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The execution time of many scientific computing applications is dominated by the time spent in performing sparse matrix vector multiplication (SMV; y â聠聬 A 脗· x). We consider improving the performance of SMV on multicores by exploiting the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. First, we identify indistinguishable vertices, i.e., vertices with the same adjacency structure, in a graph representation of the sparse matrix (A) and group them into a supernode. Next, we identify effectively dense blocks within the matrix by grouping rows and columns in each supernode. Finally, by using a suitable data structure for this representation of the matrix, we reduce the number of load operations during SMV while exactly preserving the original sparsity structure of A. In addition, we use ordering techniques to enhance locality in accesses to the vector, x, to yield an SMV kernel that exploits the effectively dense substructures in the matrix. We evaluate our scheme on Intel Nehalem and AMD Shanghai processors. We observe that for larger matrices on the Intel Nehalem processor, our method improves performance on average by 37.35% compared with the traditional compressed sparse row scheme (a blocked compressed form improves performance on average by 30.27%). Benefits of our new format are similar for the AMD processor. More importantly, if we pick for each matrix the best among our method and the blocked compressed scheme, the average performance improvements increase to 40.85%. Additional results indicate that the best performing scheme varies depending on the matrix and the system. We therefore propose an effective density measure that could be used for method selection, thus adding to the variety of options for an auto-tuned optimized SMV kernel that can exploit sparse matrix properties and hardware attributes for high performance.