Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

Authors:
Dominik Goddeke;Hilmar Wobker;Robert Strzodka;Jamaludin Mohd-Yusof;Patrick McCormick;Stefan Turek
Affiliations:
Institute of Applied Mathematics, TU Dortmund, Dortmund, Germany.;Institute of Applied Mathematics, TU Dortmund, Dortmund, Germany.;Max Planck Center, Max Planck Institut Informatik, Saarbrucken, Germany.;Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, USA.;Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, USA.;Institute of Applied Mathematics, TU Dortmund, Dortmund, Germany
Venue:
International Journal of Computational Science and Engineering
Year:
2009

Citing 17
Cited 5

Domain decomposition: parallel multilevel methods for elliptic partial differential equations

Domain decomposition: parallel multilevel methods for elliptic partial differential equations
On iterative solvers in structural mechanics; separate displacement orderings and mixed variable methods

Mathematics and Computers in Simulation - Special issue from IMACS sponsored conference: “Modelling '98”
Chromium: a stream-processing framework for interactive rendering on clusters

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
A multigrid solver for boundary value problems using programmable graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method

ACM Transactions on Mathematical Software (TOMS)
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Is 1.7 x 10^10 Unknowns the Largest Finite Element System that Can Be Solved Today?

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Hardware-oriented numerics and concepts for PDE software

Future Generation Computer Systems
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations

International Journal of Parallel, Emergent and Distributed Systems
Using GPUs to improve multigrid solver performance on a cluster

International Journal of Computational Science and Engineering
Garuda: A Scalable Tiled Display Wall Using Commodity PCs

IEEE Transactions on Visualization and Computer Graphics
Parallel particle rendering: a performance comparison between Chromium and Aura

EG PGV'06 Proceedings of the 6th Eurographics conference on Parallel Graphics and Visualization

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Journal of Computational Physics
Finite element numerical integration on GPUs

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
FSAI preconditioned CG algorithm combined with GPU technique for the finite element analysis of electromagnetic scattering problems

Finite Elements in Analysis and Design
Vectorized OpenCL implementation of numerical integration for higher order finite elements

Computers & Mathematics with Applications
Numerical integration on GPUs for higher order finite elements

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

We have previously presented an approach to include graphics processing units as co-processors in a parallel Finite Element multigrid solver called FEAST. In this paper we show that the acceleration transfers to real applications built on top of FEAST, without any modifications of the application code. The chosen solid mechanics code is well suited to assess the practicability of our approach due to higher accuracy requirements and a more diverse CPU/co-processor interaction. We demonstrate in detail that the single precision execution of the co-processor does not affect the final accuracy, and analyse how the local acceleration gains of factors 5.5-9.0 translate into 1.6- to 2.6-fold total speed-up.