Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs

Authors:
Mickeal Verschoor;Andrei C. Jalba
Affiliations:
Institute for Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands;Institute for Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
Venue:
Parallel Computing
Year:
2012

Citing 15
Cited 0

The rate of convergence of conjugate gradients

Numerische Mathematik
BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Algorithm 837: AMD, an approximate minimum degree ordering algorithm

ACM Transactions on Mathematical Software (TOMS)
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
Understanding the Performance of Sparse Matrix-Vector Multiplication

PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
State-of-the-art in heterogeneous computing

Scientific Programming
Concurrent number cruncher: an efficient sparse linear solver on the GPU

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Conjugate Gradient (CG) method is a widely-used iterative method for solving linear systems described by a (sparse) matrix. The method requires a large amount of Sparse-Matrix Vector (SpMV) multiplications, vector reductions and other vector operations to be performed. We present a number of mappings for the SpMV operation on modern programmable GPUs using the Block Compressed Sparse Row (BCSR) format. Further, we show that reordering matrix blocks substantially improves the performance of the SpMV operation, especially when small blocks are used, so that our method outperforms existing state-of-the-art approaches, in most cases. Finally, a thorough analysis of the performance of both SpMV and CG methods is performed, which allows us to model and estimate the expected maximum performance for a given (unseen) problem.