Computing the block triangular form of a sparse matrix
ACM Transactions on Mathematical Software (TOMS)
A unified geometric approach to graph separators
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Iterative solution methods
A multilevel algorithm for partitioning graphs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
A multigrid tutorial: second edition
A multigrid tutorial: second edition
Multigrid
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
Compact representations of separable graphs
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Support Theory for Preconditioning
SIAM Journal on Matrix Analysis and Applications
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Random Walks for Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Combinatorial and algebraic tools for optimal multilevel algorithms
Combinatorial and algebraic tools for optimal multilevel algorithms
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Analysis of Aggregation-Based Multigrid
SIAM Journal on Scientific Computing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Faster approximate lossy generalized flow via interior point algorithms
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Minimizing communication in sparse matrix solvers
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Combinatorial Preconditioners for Scalar Elliptic Finite-Element Problems
SIAM Journal on Matrix Analysis and Applications
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Approaching Optimality for Solving SDD Linear Systems
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
A breakthrough in algorithm design
Communications of the ACM
Sparse matrix-vector multiply on the HICAMP architecture
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.02 |
Memory bandwidth is a major limiting factor in the scalability of parallel iterative algorithms that rely on sparse matrix-vector multiplication (SpMV). This paper introduces Hierarchical Diagonal Blocking (HDB), an approach which we believe captures many of the existing optimization techniques for SpMV in a common representation. Using this representation in conjuction with precision-reduction techniques, we develop and evaluate high-performance SpMV kernels. We also study the implications of using our SpMV kernels in a complete iterative solver. Our method of choice is a Combinatorial Multigrid solver that can fully utilize our fastest reduced-precision SpMV kernel without sacrificing the quality of the solution. We provide extensive empirical evaluation of the effectiveness of the approach on a variety of benchmark matrices, demonstrating substantial speedups on all matrices considered.