GPU-accelerated preconditioned iterative linear solvers

Authors:
Ruipeng Li;Yousef Saad
Affiliations:
Department of Computer Science & Engineering, University of Minnesota, Minneapolis, USA 55455;Department of Computer Science & Engineering, University of Minnesota, Minneapolis, USA 55455
Venue:
The Journal of Supercomputing
Year:
2013

Citing 11
Cited 2

The evolution of the minimum degree ordering algorithm

SIAM Review
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
The kill rule for multicore

Proceedings of the 44th annual Design Automation Conference
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Solving Sparse Linear Systems on NVIDIA Tesla GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Automatically tuning sparse matrix-vector multiplication for GPU architectures

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Parallel power grid analysis using preconditioned GMRES solver on CPU-GPU platforms

Proceedings of the International Conference on Computer-Aided Design
GPU-based iterative transmission reconstruction in 3D ultrasound computer tomography

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved.