IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Design patterns for scientific computations on sparse matrices
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
Proceedings of the 26th ACM international conference on Supercomputing
GPU-based parallel algorithms for sparse nonlinear systems
Journal of Parallel and Distributed Computing
GPU acceleration of the matrix-free interior point method
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
The BiConjugate gradient method on GPUs
The Journal of Supercomputing
Development of a unified FDTD-FEM library for electromagnetic analysis with CPU and GPU computing
The Journal of Supercomputing
Semi-sparse algorithm based on multi-layer optimization for recommender system
The Journal of Supercomputing
Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms
Scientific Programming
Hi-index | 0.00 |
Sparse matrices are involved in linear systems, eigensystems and partial differential equations from a wide spectrum of scientific and engineering disciplines. Hence, sparse matrix vector product (SpMV) is considered as key operation in engineering and scientific computing. For these applications the optimization of the sparse matrix vector product (SpMV) is very relevant. However, the irregular computation involved in SpMV prevents the optimum exploitation of computational architectures when the sparse matrices are very large. Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for GPUs have already appeared on the scene. This work proposes and evaluates new implementations of SpMV for GPUs called ELLR-T. They are based on the format ELLPACK-R, which allows storage of the sparse matrix in a regular manner. A comparative evaluation against a variety of storage formats previously proposed has been carried out based on a representative set of test matrices. The results show that: (1) the SpMV is highly accelerated with GPUs; (2) the performance strongly depends on the specific pattern of the matrix; and (3) the implementations ELLR-T achieve higher overall performance. Consequently, the new implementations of SpMV, ELLR-T, described in this paper can help to exploit the GPUs, because, they achieve high performance and they can be easily joined in the engineering and scientific computing.