Automatic performance tuning of sparse matrix kernels
Automatic performance tuning of sparse matrix kernels
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Concurrent number cruncher: an efficient sparse linear solver on the GPU
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Iterative sparse Matrix-Vector multiplication for integer factorization on GPUs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A memory accelerator with gather functions for bandwidth-bound irregular applications
Proceedings of the first workshop on Irregular applications: architectures and algorithm
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
Proceedings of the 26th ACM international conference on Supercomputing
GPU-accelerated preconditioned iterative linear solvers
The Journal of Supercomputing
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
yaSpMV: yet another SpMV framework on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiled multithreaded data paths on FPGAs for dynamic workloads
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.