Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam
International Journal of High Performance Computing Applications
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
IBM Journal of Research and Development
POWER3: the next generation of PowerPC processors
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
IBM POWER7 multicore server processor
IBM Journal of Research and Development
Sparse triangular solves for ILU revisited: data layout crucial to better performance
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
The streamed storage format for sparse matrices showed good performance improvement for sparse matrix and vector multiply (SpMV) compared with compressed sparse row (CSR) and block CSR (BCSR) formats, particularly on IBM Power processors. We extend the format to exploit single instruction multiple data (SIMD) instructions in order to utilize the vector unit, and discuss how the streamed formats perform on the Power7 processor, which is the first eight-core chip from IBM. The streamed format is then applied to two more operations of sparse matrices, successive over-relaxation (SOR) iteration sweeps and incomplete lower and upper (ILU) triangular solvers. Basic solvers are developed for them in the high-performance computing (HPC) package PETSc. Test results on the IBM Power7 processor show that the SIMD instructions improve the performance of the streamed storage format on SpMV. The format also accelerates SOR iteration sweeps and ILU matrix solvers, compared with the traditional BCSR format used in PETSc.