Concurrent number cruncher: an efficient sparse linear solver on the GPU

Authors:
Luc Buatois;Guillaume Caumon;Bruno Lévy
Affiliations:
Gocad Research Group, INRIA, Nancy Université, France;ENSG, CRPG, Nancy Université, France;ALICE, INRIA Lorraine, Nancy, France
Venue:
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Year:
2007

Citing 10
Cited 19

Least squares conformal maps for automatic texture atlas generation

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics

The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh

Metaprogramming GPUs with Sh
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance-oriented data parallel virtual machine for GPUs

ACM SIGGRAPH 2006 Sketches

Edge-preserving decompositions for multi-scale tone and detail manipulation

ACM SIGGRAPH 2008 papers
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Scalable parallel programming with CUDA

ACM SIGGRAPH 2008 classes
Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Accelerating PQMRCGSTAB algorithm on GPU

Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
User-assisted intrinsic images

ACM SIGGRAPH Asia 2009 papers
GPU friendly fast Poisson solver for structured power grid network analysis

Proceedings of the 46th Annual Design Automation Conference
Haptic rendering of deformable objects using a multiple FPGA parallel computing architecture

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Simulations of the electrical activity in the heart with graphic processing units

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Variational Bayesian image super-resolution with GPU acceleration

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
A convex image segmentation: extending graph cuts and closed-form matting

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Real-time implementation for weighted-least-squares-based edge-preserving decomposition and its applications

Transactions on edutainment VI
GPU accelerated CAE using open solvers and the cloud

ACM SIGARCH Computer Architecture News
Sparse systems solving on GPUs with GMRES

The Journal of Supercomputing
Automatically tuning sparse matrix-vector multiplication for GPU architectures

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
GPU-accelerated finite element method for modelling light transport in diffuse optical tomography

Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs

Parallel Computing
A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).