Block algorithms for sparse matrix computations on high performance workstations
ICS '96 Proceedings of the 10th international conference on Supercomputing
An Approximate Minimum Degree Ordering Algorithm
SIAM Journal on Matrix Analysis and Applications
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Performance Optimization and Modeling of Blocked Sparse Kernels
International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Reordering Algorithms for Increasing Locality on Multicore Processors
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures
CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Fast sparse matrix-vector multiplication by exploiting variable block structure
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor
Journal of Parallel and Distributed Computing
yaSpMV: yet another SpMV framework on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
It is well-known that reordering techniques applied to sparse matrices are common strategies to improve the performance of sparse matrix operations, and particularly, the sparse matrix vector multiplication (SpMV) on CPUs. In this paper, we have evaluated some of the most successful reordering techniques on two different GPUs. In addition, in our study a number of sparse matrix storage formats were considered. Executions for both single and double precision arithmetics were also performed. We have found that SpMV is very sensitive to the application of reordering techniques on GPUs. In particular, several characteristics of the reordered matrices that have a big impact on the SpMV performance have been detected. In most of the cases, reordered matrices outperform the original ones, showing noticeable speedups up to 2.6x. We have also observed that there is no one storage format preferred over the others.