A brief review of the ITPACK project
Journal of Computational and Applied Mathematics - Special issue on iterative methods for the solution of linear systems
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Concurrent number cruncher: a GPU implementation of a general sparse linear solver
International Journal of Parallel, Emergent and Distributed Systems
Improving the Performance of the Sparse Matrix Vector Product with GPUs
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
A new approach for sparse matrix vector product on NVIDIA GPUs
Concurrency and Computation: Practice & Experience
GPU-based parallel solver via the Kantorovich theorem for the nonlinear Bernstein polynomial systems
Computers & Mathematics with Applications
Solving non-linear systems of equations on graphics processing units
LSSC'09 Proceedings of the 7th international conference on Large-Scale Scientific Computing
Parallel preconditioned conjugate gradient algorithm on GPU
Journal of Computational and Applied Mathematics
Editorial: Special issue editorial: Accelerators for high-performance computing
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this work we describe some parallel algorithms for solving nonlinear systems using CUDA (Compute Unified Device Architecture) over a GPU (Graphics Processing Unit). The proposed algorithms are based on both the Fletcher-Reeves version of the nonlinear conjugate gradient method and a polynomial preconditioner type based on block two-stage methods. Several strategies of parallelization and different storage formats for sparse matrices are discussed. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment.