The rate of convergence of conjugate gradients
Numerische Mathematik
SIAM Journal on Scientific and Statistical Computing
Reducing the bandwidth of sparse symmetric matrices
ACM '69 Proceedings of the 1969 24th national conference
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Algorithm 837: AMD, an approximate minimum degree ordering algorithm
ACM Transactions on Mathematical Software (TOMS)
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
Parallel Computing
Understanding the Performance of Sparse Matrix-Vector Multiplication
PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fast Conjugate Gradients with Multiple GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
State-of-the-art in heterogeneous computing
Scientific Programming
Concurrent number cruncher: an efficient sparse linear solver on the GPU
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
The Conjugate Gradient (CG) method is a widely-used iterative method for solving linear systems described by a (sparse) matrix. The method requires a large amount of Sparse-Matrix Vector (SpMV) multiplications, vector reductions and other vector operations to be performed. We present a number of mappings for the SpMV operation on modern programmable GPUs using the Block Compressed Sparse Row (BCSR) format. Further, we show that reordering matrix blocks substantially improves the performance of the SpMV operation, especially when small blocks are used, so that our method outperforms existing state-of-the-art approaches, in most cases. Finally, a thorough analysis of the performance of both SpMV and CG methods is performed, which allows us to model and estimate the expected maximum performance for a given (unseen) problem.