Internet Streaming SIMD Extensions
Computer
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Automatic performance tuning of sparse matrix kernels
Automatic performance tuning of sparse matrix kernels
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Improving the Performance of the Sparse Matrix Vector Product with GPUs
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs
ICCIS '10 Proceedings of the 2010 International Conference on Computational and Information Sciences
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
OpenCL Programming Guide
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Automatically tuning sparse matrix-vector multiplication for GPU architectures
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
yaSpMV: yet another SpMV framework on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Sparse matrix vector multiplication (SpMV) kernel is a key computation in linear algebra. Most iterative methods are composed of SpMV operations with BLAS1 updates. Therefore, researchers make extensive efforts to optimize the SpMV kernel in sparse linear algebra. With the appearance of OpenCL, a programming language that standardizes parallel programming across a wide variety of heterogeneous platforms, we are able to optimize the SpMV kernel on many different platforms. In this paper, we propose a new sparse matrix format, the Cocktail Format, to take advantage of the strengths of many different sparse matrix formats. Based on the Cocktail Format, we develop the clSpMV framework that is able to analyze all kinds of sparse matrices at runtime, and recommend the best representations of the given sparse matrices on different platforms. Although solutions that are portable across diverse platforms generally provide lower performance when compared to solutions that are specialized to particular platforms, our experimental results show that clSpMV can find the best representations of the input sparse matrices on both Nvidia and AMD platforms, and deliver 83% higher performance compared to the vendor optimized CUDA implementation of the proposed hybrid sparse format in [3], and 63.6% higher performance compared to the CUDA implementations of all sparse formats in [3].