The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
hypre: A Library of High Performance Preconditioners
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
An overview of the Trilinos project
ACM Transactions on Mathematical Software (TOMS) - Special issue on the Advanced CompuTational Software (ACTS) Collection
An Introduction to Algebraic Multigrid
Computing in Science and Engineering
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Fast sparse matrix-vector multiplication on GPUs: implications for graph mining
Proceedings of the VLDB Endowment
CSX: an extended compression format for spmv on shared memory systems
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A Sparse Matrix Personality for the Convey HC-1
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
CRSD: application specific auto-tuning of SpMV for diagonal sparse matrices
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Fast sparse matrix-vector multiplication by exploiting variable block structure
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Sparse matrix-vector multiply on the HICAMP architecture
Proceedings of the 26th ACM international conference on Supercomputing
clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
Proceedings of the 26th ACM international conference on Supercomputing
Nonuniform memory affinity strategy in multithreaded sparse matrix computations
Proceedings of the 2012 Symposium on High Performance Computing
Hi-index | 0.00 |
Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. SMAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multigrid solver from Hypre library with above 20% performance improvement reported.