Sparse matrix computations on parallel processor arrays
SIAM Journal on Scientific Computing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Concurrent number cruncher: a GPU implementation of a general sparse linear solver
International Journal of Parallel, Emergent and Distributed Systems
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Improving the Performance of the Sparse Matrix Vector Product with GPUs
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Automatically tuning sparse matrix-vector multiplication for GPU architectures
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
The BiConjugate gradient method on GPUs
The Journal of Supercomputing
Hi-index | 0.00 |
A wide range of applications in engineering and scientific computing are involved in the acceleration of the sparse matrix vector product (SpMV). Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for GPUs have already appeared on the scene. This work is focused on the ELLR-T algorithm to compute SpMV on GPU architecture, its performance is strongly dependent on the optimum selection of two parameters. Therefore, taking account that the memory operations dominate the performance of ELLR-T, an analytical model is proposed in order to obtain the auto-tuning of ELLR-T for particular combinations of sparse matrix and GPU architecture. The evaluation results with a representative set of test matrices show that the average performance achieved by auto-tuned ELLR-T by means of the proposed model is near to the optimum. A comparative analysis of ELLR-T against a variety of previous proposals shows that ELLR-T with the estimated configuration reaches the best performance on GPU architecture for the representative set of test matrices.