Least squares conformal maps for automatic texture atlas generation
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance-oriented data parallel virtual machine for GPUs
ACM SIGGRAPH 2006 Sketches
Edge-preserving decompositions for multi-scale tone and detail manipulation
ACM SIGGRAPH 2008 papers
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Scalable parallel programming with CUDA
ACM SIGGRAPH 2008 classes
Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Accelerating PQMRCGSTAB algorithm on GPU
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
User-assisted intrinsic images
ACM SIGGRAPH Asia 2009 papers
GPU friendly fast Poisson solver for structured power grid network analysis
Proceedings of the 46th Annual Design Automation Conference
Haptic rendering of deformable objects using a multiple FPGA parallel computing architecture
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Simulations of the electrical activity in the heart with graphic processing units
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Variational Bayesian image super-resolution with GPU acceleration
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
A convex image segmentation: extending graph cuts and closed-form matting
ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Transactions on edutainment VI
GPU accelerated CAE using open solvers and the cloud
ACM SIGARCH Computer Architecture News
Sparse systems solving on GPUs with GMRES
The Journal of Supercomputing
Automatically tuning sparse matrix-vector multiplication for GPU architectures
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
GPU-accelerated finite element method for modelling light transport in diffuse optical tomography
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of the mesh. The advent of GPUs with their ever-growing amount of parallel horsepower makes them a tempting resource for such numerical computations. This can be helped by new APIs (CTM from ATI and CUDA from NVIDIA) which give a direct access to the multithreaded computational resources and associated memory bandwidth of GPUs; CUDA even provides a BLAS implementation but only for dense matrices (CuBLAS). However, existing GPU linear solvers are restricted to specific types of matrices, or use non-optimal compressed row storage strategies. By combining recent GPU programming techniques with supercomputing strategies (namely block compressed row storage and register blocking), we implement a sparse general-purpose linear solver which outperforms leading-edge CPU counterparts (MKL / ACML).