The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
OpenGL(R) Shading Language
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
A performance-oriented data parallel virtual machine for GPUs
ACM SIGGRAPH 2006 Sketches
International Journal of Parallel, Emergent and Distributed Systems
Duplex fitting of zero-level and offset surfaces
Computer-Aided Design
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
Fast Conjugate Gradients with Multiple GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU based sparse grid technique for solving multidimensional options pricing PDEs
Proceedings of the 2nd Workshop on High Performance Computational Finance
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Motion-based video retargeting with optimized crop-and-warp
ACM SIGGRAPH 2010 papers
Natural neighbor interpolation based grid DEM construction using a GPU
Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
A fast GPU implementation for solving sparse ill-posed linear equation systems
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Fast sparse matrix-vector multiplication on GPUs: implications for graph mining
Proceedings of the VLDB Endowment
Iterative solution of linear systems in electromagnetics (and not only): experiences with CUDA
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Implicit FEM and fluid coupling on GPU for interactive multiphysics simulation
ACM SIGGRAPH 2011 Talks
The challenges of writing portable, correct and high performance libraries for GPUs
ACM SIGARCH Computer Architecture News
Parallel preconditioned conjugate gradient algorithm on GPU
Journal of Computational and Applied Mathematics
GPU-based parallel algorithms for sparse nonlinear systems
Journal of Parallel and Distributed Computing
International Journal of High Performance Computing Applications
Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform
The Journal of Supercomputing
Architecting the finite element method pipeline for the GPU
Journal of Computational and Applied Mathematics
A novel finite element method assembler for co-processors and accelerators
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
A wide class of numerical methods needs to solve a linear system, where the matrix pattern of non-zero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on graphics processor units (GPUs), especially since dedicated general purpose APIs such as close-to-metal (CTM) (AMD-ATI) and compute unified device architecture (CUDA) (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage (BCRS), register blocking and vectorization), to implement a sparse general-purpose linear solver. Our implementation of the Jacobi-preconditioned conjugate gradient algorithm outperforms by up to a factor of 6.0 × leading-edge CPU counterparts, making it attractive for applications which are content with single precision.