A combined unifrontal/multifrontal method for unsymmetric sparse matrices
ACM Transactions on Mathematical Software (TOMS)
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
PSBLAS: a library for parallel linear algebra computation on sparse matrices
ACM Transactions on Mathematical Software (TOMS)
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
SIAM Journal on Matrix Analysis and Applications
Sparse gaussian elimination on high-performance computers
Sparse gaussian elimination on high-performance computers
Automatic performance tuning of sparse matrix kernels
Automatic performance tuning of sparse matrix kernels
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
ACM Transactions on Mathematical Software (TOMS)
From Sparse Matrix to Optimal GPU CUDA Sparse Matrix Vector Product Implementation
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Microprocessors & Microsystems
Efficient sparse matrix-vector multiplication on x86-based many-core processors
Proceedings of the 27th international ACM conference on International conference on supercomputing
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
We present a method for automatically selecting optimalimplementations of sparse matrix-vector operations. Our software"AcCELS" (Accelerated Compress-storage Elements for Linear Solvers)involves a setup phase that probes machine characteristics, and arun-time phase where stored characteristics are combined with ameasure of the actual sparse matrix to find the optimal kernelimplementation. We present a performance model that is shown to beaccurate over a large range of matrices.