Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Imagine: Media Processing with Streams
IEEE Micro
Media Processing Applications on the Imagine Stream Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
A programming system for the imagine media processor
A programming system for the imagine media processor
Programmable Stream Processors
Computer
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Stream Register Files with Indexed Access
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast sparse matrix-vector multiplication by exploiting variable block structure
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Sparse matrix-vector multiplication (shortly SpMV) dominates the performance of many scientific and engineering applications. However, it tends to run much more slowly than its dense counterpart because the algorithms have poor temporal and spatial locality, the memory access patterns are irregular. Its performance depends heavily on both the nonzero structure of the sparse matrix and on the machine architecture. In this paper, we address the problem of implementing and optimizing SpMV on Imagine stream processor. We present three classes of implementation algorithms based on different key ideas, first two of which highlight different aspects of underlying stream architecture, and the third algorithm is inspired by the SpMV vector implementation. Then we discuss some critical optimizations. The experimental results over same benchmarks show we achieve up to an average 67 percent relative improvement over published evaluation.