Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor

Authors:
Li Wang;Xue Jun Yang;Gui Bin Wang;Xiao Bo Yan;Yu Deng;Jing Du;Ying Zhang;Tao Tang;Kun Zeng
Affiliations:
National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China;National Labotary for Parelleling and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, Hunan P.R. of China
Venue:
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Year:
2007

Citing 13
Cited 0

Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Efficient conditional operations for data-parallel architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Imagine: Media Processing with Streams

IEEE Micro
Media Processing Applications on the Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
The Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication
A programming system for the imagine media processor

A programming system for the imagine media processor
Programmable Stream Processors

Computer
Evaluating the Imagine Stream Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Stream Register Files with Indexed Access

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse matrix-vector multiplication (shortly SpMV) dominates the performance of many scientific and engineering applications. However, it tends to run much more slowly than its dense counterpart because the algorithms have poor temporal and spatial locality, the memory access patterns are irregular. Its performance depends heavily on both the nonzero structure of the sparse matrix and on the machine architecture. In this paper, we address the problem of implementing and optimizing SpMV on Imagine stream processor. We present three classes of implementation algorithms based on different key ideas, first two of which highlight different aspects of underlying stream architecture, and the third algorithm is inspired by the SpMV vector implementation. Then we discuss some critical optimizations. The experimental results over same benchmarks show we achieve up to an average 67 percent relative improvement over published evaluation.