Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
A multigrid tutorial (2nd ed.)
A multigrid tutorial (2nd ed.)
Parallel Algebraic Multigrid Methods on Distributed Memory Computers
SIAM Journal on Scientific Computing
Tutorial on Elliptic PDE Solvers and Their Parallelization
Tutorial on Elliptic PDE Solvers and Their Parallelization
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Multigrid Methods on GPUs
Sparse approximate inverse preconditioners for iterative solvers on GPUs
Proceedings of the 2012 Symposium on High Performance Computing
Efficient AMG on heterogeneous systems
Facing the Multicore-Challenge II
Facing the Multicore-Challenge II
Facing the Multicore-Challenge II
Parallel geometric-algebraic multigrid on unstructured forests of octrees
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient setup of aggregation AMG for CFD on GPUs
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Using LAMA for efficient AMG on hybrid clusters
Computer Science - Research and Development
Architecting the finite element method pipeline for the GPU
Journal of Computational and Applied Mathematics
Hi-index | 0.01 |
The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core.