Mathematics and Computers in Simulation
Parallel Preconditioning with Sparse Approximate Inverses
SIAM Journal on Scientific Computing
Approximate sparsity patterns for the inverse of a matrix and preconditioning
IMACS'97 Proceedings on the on Iterative methods and preconditioners
Iterative solution of linear systems in the 20th century
Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Fast Conjugate Gradients with Multiple GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploiting graphical processing units for data-parallel scientific applications
Concurrency and Computation: Practice & Experience
Programming Massively Parallel Processors: A Hands-on Approach
Programming Massively Parallel Processors: A Hands-on Approach
CUDA by Example: An Introduction to General-Purpose GPU Programming
CUDA by Example: An Introduction to General-Purpose GPU Programming
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Technical Note: A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA
Computers and Structures
Hi-index | 7.29 |
During the past decades, explicit finite element approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. New parallel computational techniques are proposed for the parallelization of the Optimized Banded Generalized Approximate Inverse Finite Element Matrix (OBGAIFEM) algorithm, based on the concept of the ''fish bone'' computational approach, and for the Explicit Preconditioned Conjugate Gradient type methods on a General Purpose Graphics Processing Unit (GPGPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. Finally, numerical results for the performance of the finite element explicit approximate inverse preconditioning for solving characteristic two dimensional boundary value problems on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed methods are also discussed.