Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach

Authors:
Francisco VáZquez;José JesúS FernáNdez;Ester M. GarzóN
Affiliations:
Almeria University, Dpt Computer Architecture and Electronics, Ctra San Urbano s/n Cañada, 04120 Almeria, Spain;Almeria University, Dpt Computer Architecture and Electronics, Ctra San Urbano s/n Cañada, 04120 Almeria, Spain and Centro Nacional de Biotecnologia (CNB-CSIC), Darwin 3, Campus de Cantoblanc ...;Almeria University, Dpt Computer Architecture and Electronics, Ctra San Urbano s/n Cañada, 04120 Almeria, Spain
Venue:
Parallel Computing
Year:
2012

Citing 11
Cited 2

Sparse matrix computations on parallel processor arrays

SIAM Journal on Scientific Computing
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

Parallel Computing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel Computing
Concurrent number cruncher: a GPU implementation of a general sparse linear solver

International Journal of Parallel, Emergent and Distributed Systems
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Improving the Performance of the Sparse Matrix Vector Product with GPUs

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Automatically tuning sparse matrix-vector multiplication for GPU architectures

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

The BiConjugate gradient method on GPUs

The Journal of Supercomputing
Algebraic flux correction for nonconforming finite element discretizations of scalar transport problems

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wide range of applications in engineering and scientific computing are involved in the acceleration of the sparse matrix vector product (SpMV). Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for GPUs have already appeared on the scene. This work is focused on the ELLR-T algorithm to compute SpMV on GPU architecture, its performance is strongly dependent on the optimum selection of two parameters. Therefore, taking account that the memory operations dominate the performance of ELLR-T, an analytical model is proposed in order to obtain the auto-tuning of ELLR-T for particular combinations of sparse matrix and GPU architecture. The evaluation results with a representative set of test matrices show that the average performance achieved by auto-tuned ELLR-T by means of the proposed model is near to the optimum. A comparative analysis of ELLR-T against a variety of previous proposals shows that ELLR-T with the estimated configuration reaches the best performance on GPU architecture for the representative set of test matrices.