Data distributions for sparse matrix vector multiplication
Parallel Computing
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
Automatic performance tuning of sparse matrix kernels
Automatic performance tuning of sparse matrix kernels
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam
International Journal of High Performance Computing Applications
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Performance Optimization and Modeling of Blocked Sparse Kernels
International Journal of High Performance Computing Applications
IBM Journal of Research and Development
POWER3: the next generation of PowerPC processors
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
Vectorized sparse matrix multiply for compressed row storage format
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Exploiting dense substructures for fast sparse matrix vector multiplication
International Journal of High Performance Computing Applications
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Sparse matrixâ聙聰vector multiply is an important operation in a wide range of problems. One of the key factors determining the performance of this operation is sustained memory bandwidth. In the IBM POWER architecture, there is a hardware component called a prefetch data stream that can significantly increase sustained memory bandwidth. We have developed a new family of storage formats for sparse matrices that exploits this capability. Test results show that our new streamed storage formats can significantly improve the performance of sparse matrix and vector multiply on IBM POWER processors, compared to traditional compressed sparse row and block compressed sparse row formats. The new formats also provide a benefit on x86 processors.