SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

Authors:
Jiajia Li;Guangming Tan;Mingyu Chen;Ninghui Sun
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Year:
2013

Citing 22
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
hypre: A Library of High Performance Preconditioners

ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
An overview of the Trilinos project

ACM Transactions on Mathematical Software (TOMS) - Special issue on the Advanced CompuTational Software (ACTS) Collection
An Introduction to Algebraic Multigrid

Computing in Science and Engineering
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel Computing
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Fast sparse matrix-vector multiplication on GPUs: implications for graph mining

Proceedings of the VLDB Endowment
CSX: an extended compression format for spmv on shared memory systems

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A Sparse Matrix Personality for the Convey HC-1

FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
Sparse matrix-vector multiply on the HICAMP architecture

Proceedings of the 26th ACM international conference on Supercomputing
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs

Proceedings of the 26th ACM international conference on Supercomputing
Nonuniform memory affinity strategy in multithreaded sparse matrix computations

Proceedings of the 2012 Symposium on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. SMAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multigrid solver from Hypre library with above 20% performance improvement reported.