Performance Optimization and Modeling of Blocked Sparse Kernels

Authors:
Alfredo Buttari;Victor Eijkhout;Julien Langou;Salvatore Filippone
Affiliations:
INNOVATIVE COMPUTING LABORATORY, UNIVERSITY OF TENNESSEE, KNOXVILLE, TN;TEXAS ADVANCED COMPUTING LABORATORY, THE UNIVERSITY OF TEXAS AT AUSTIN;DEPARTMENT OF MATHEMATICAL SCIENCES, UNIVERSITY OF COLORADO AT DENVER AND HEALTH SCIENCES CENTER, CO;TOR VERGATA UNIVERSITY, ROME, ITALY
Venue:
International Journal of High Performance Computing Applications
Year:
2007

Citing 7
Cited 7

A combined unifrontal/multifrontal method for unsymmetric sparse matrices

ACM Transactions on Mathematical Software (TOMS)
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
PSBLAS: a library for parallel linear algebra computation on sparse matrices

ACM Transactions on Mathematical Software (TOMS)
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
Sparse gaussian elimination on high-performance computers

Sparse gaussian elimination on high-performance computers
Automatic performance tuning of sparse matrix kernels

Automatic performance tuning of sparse matrix kernels
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications

Increasing the Locality of Iterative Methods and Its Application to the Simulation of Semiconductor Devices

International Journal of High Performance Computing Applications
MLD2P4: A Package of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Fortran 95

ACM Transactions on Mathematical Software (TOMS)
From Sparse Matrix to Optimal GPU CUDA Sparse Matrix Vector Product Implementation

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Optimizing Sparse Data Structures for Matrix-vector Multiply

International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

Microprocessors & Microsystems
Efficient sparse matrix-vector multiplication on x86-based many-core processors

Proceedings of the 27th international ACM conference on International conference on supercomputing
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for automatically selecting optimalimplementations of sparse matrix-vector operations. Our software"AcCELS" (Accelerated Compress-storage Elements for Linear Solvers)involves a setup phase that probes machine characteristics, and arun-time phase where stored characteristics are combined with ameasure of the actual sparse matrix to find the optimal kernelimplementation. We present a performance model that is shown to beaccurate over a large range of matrices.