A fast and simple randomized parallel algorithm for the maximal independent set problem
Journal of Algorithms
Applied numerical linear algebra
Applied numerical linear algebra
Parallel multigrid solver for 3D unstructured finite element problems
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel smoothed aggregation multigrid: aggregation strategies on massively parallel machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
BoomerAMG: a parallel algebraic multigrid solver and preconditioner
Applied Numerical Mathematics - Developments and trends in iterative methods for large systems of equations—in memoriam Rüdiger Weiss
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Journal of Parallel and Distributed Computing
Concurrent number cruncher: a GPU implementation of a general sparse linear solver
International Journal of Parallel, Emergent and Distributed Systems
Nodal discontinuous Galerkin methods on graphics processors
Journal of Computational Physics
Multigrid Smoothers for Ultraparallel Computing
SIAM Journal on Scientific Computing
A parallel algebraic multigrid solver on graphics processing units
HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Hi-index | 7.29 |
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multigrid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87x speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51x versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.