On the GPGPU parallelization issues of finite element approximate inverse preconditioning

Authors:
C. K. Filelis-Papadopoulos;G. A. Gravvanis;P. I. Matskanidis;K. M. Giannoutakis
Affiliations:
Department of Electrical & Computer Engineering, School of Engineering, Democritus University of Thrace, University Campus, Kimmeria, GR 67100 Xanthi, Greece;Department of Electrical & Computer Engineering, School of Engineering, Democritus University of Thrace, University Campus, Kimmeria, GR 67100 Xanthi, Greece;Department of Electrical & Computer Engineering, School of Engineering, Democritus University of Thrace, University Campus, Kimmeria, GR 67100 Xanthi, Greece;Centre for Research and Technology Hellas, Informatics and Telematics Institute, GR 57001 Thermi, Greece
Venue:
Journal of Computational and Applied Mathematics
Year:
2011

Citing 14
Cited 1

Explicit semi-direct methods based on approximate inverse matrix techniques for solving boundary-value problems on parallel processors

Mathematics and Computers in Simulation
Parallel Preconditioning with Sparse Approximate Inverses

SIAM Journal on Scientific Computing
Approximate sparsity patterns for the inverse of a matrix and preconditioning

IMACS'97 Proceedings on the on Iterative methods and preconditioners
Iterative solution of linear systems in the 20th century

Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploiting graphical processing units for data-parallel scientific applications

Concurrency and Computation: Practice & Experience
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
CUDA by Example: An Introduction to General-Purpose GPU Programming

CUDA by Example: An Introduction to General-Purpose GPU Programming
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software

Technical Note: A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA

Computers and Structures

Quantified Score

Hi-index	7.29

Visualization

Abstract

During the past decades, explicit finite element approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. New parallel computational techniques are proposed for the parallelization of the Optimized Banded Generalized Approximate Inverse Finite Element Matrix (OBGAIFEM) algorithm, based on the concept of the ''fish bone'' computational approach, and for the Explicit Preconditioned Conjugate Gradient type methods on a General Purpose Graphics Processing Unit (GPGPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. Finally, numerical results for the performance of the finite element explicit approximate inverse preconditioning for solving characteristic two dimensional boundary value problems on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed methods are also discussed.