Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor

  • Authors:
  • Li Wang;Xue Jun Yang;Gui Bin Wang;Xiao Bo Yan;Yu Deng;Jing Du;Ying Zhang;Tao Tang;Kun Zeng

  • Affiliations:
  • National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China

  • Venue:
  • ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sparse matrix-vector multiplication (shortly SpMV) dominates the performance of many scientific and engineering applications. However, it tends to run much more slowly than its dense counterpart because the algorithms have poor temporal and spatial locality, the memory access patterns are irregular. Its performance depends heavily on both the nonzero structure of the sparse matrix and on the machine architecture. In this paper, we address the problem of implementing and optimizing SpMV on Imagine stream processor. We present three classes of implementation algorithms based on different key ideas, first two of which highlight different aspects of underlying stream architecture, and the third algorithm is inspired by the SpMV vector implementation. Then we discuss some critical optimizations. The experimental results over same benchmarks show we achieve up to an average 67 percent relative improvement over published evaluation.