CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices

  • Authors:
  • Xiangzheng Sun;Yunquan Zhang;Ting Wang;Guoping Long;Xianyi Zhang;Yan Li

  • Affiliations:
  • Institute of Software, Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences;Institute of Software, Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences;Institute of Software, Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences;Institute of Software, Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences;Institute of Software, Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences;Institute of Software, Chinese Academy of Sciences. Graduate University of Chinese Academy of Sciences

  • Venue:
  • Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Sparse Matrix-Vector multiplication (SpMV) is an important computational kernel in scientific applications. Its performance highly depends on the nonzero distribution of sparse matrices. In this paper, we propose a new storage format for diagonal sparse matrices, defined as Compressed Row Segment with Diagonal-pattern (CRSD). We design diagonal patterns to represent the diagonal distribution. As the diagonal distributions are similar within matrices from one application, some diagonal patterns remain unchanged. First, we sample one matrix to obtain the unchanged diagonal patterns. Next, the optimal SpMV codelets are generated automatically for those diagonal patterns. Finally, we combine the generated codelets as the optimal SpMV implementation. In addition, the information collected during auto-tuning process is also utilized for parallel implementation to achieve load-balance. Experimental results demonstrate that the speedup reaches up to 2.37 (1.70 on average) in comparison with DIA and 4.60 (2.10 on average) in comparison with CSR under the same number of threads on two mainstream multi-core platforms.