Using GPUs to improve multigrid solver performance on a cluster

Authors:
Dominik Goddeke;Robert Strzodka;Jamaludin Mohd-Yusof;Patrick McCormick;Hilmar Wobker;Christian Becker;Stefan Turek
Affiliations:
Institut fur Angewandte Mathematik, TU Dortmund, Germany.;Max Planck Center, Max Planck Institut Informatik, Saarbrucken, Germany.;Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, USA.;Computer, Computational and Statistical Sciences Division, Los Alamos National Laboratory, USA.;Institut fur Angewandte Mathematik, TU Dortmund, Germany.;Institut fur Angewandte Mathematik, TU Dortmund, Germany.;Institut fur Angewandte Mathematik, TU Dortmund, Germany
Venue:
International Journal of Computational Science and Engineering
Year:
2008

Citing 33
Cited 11

Domain decomposition: parallel multilevel methods for elliptic partial differential equations

Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Cache-aware multigrid methods for solving Poisson's equation in two dimensions

Computing
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
High-performacne parallel implicit CFD

Parallel Computing - Special issue on parallel computing in aerospace
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Techniques for Optimizing Applications: High Performance Computing

Techniques for Optimizing Applications: High Performance Computing
Design, implementation and testing of extended and mixed precision BLAS

ACM Transactions on Mathematical Software (TOMS)
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Interactive visibility culling in complex environments using occlusion-switches

I3D '03 Proceedings of the 2003 symposium on Interactive 3D graphics
Very Large Scale Spatial Computing

UMC '02 Proceedings of the Third International Conference on Unconventional Models of Computation
Exploring the VLSI Scalability of Stream Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A multigrid solver for boundary value problems using programmable graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting fast hardware floating point in high precision computation

ISSAC '03 Proceedings of the 2003 international symposium on Symbolic and algebraic computation
Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method

ACM Transactions on Mathematical Software (TOMS)
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Using Multiple Graphics Cards as a General Purpose Parallel Computer: Applications to Computer Vision

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hardware-oriented numerics and concepts for PDE software

Future Generation Computer Systems
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Error bounds from extra-precise iterative refinement

ACM Transactions on Mathematical Software (TOMS)
A performance-oriented data parallel virtual machine for GPUs

ACM SIGGRAPH 2006 Sketches
TOP500 supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Mapping computational concepts to GPUs

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Optimising data movement rates for parallel processing applications on graphics processors

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations

International Journal of Parallel, Emergent and Distributed Systems

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
Streaming multigrid for gradient-domain operations on large images

ACM SIGGRAPH 2008 papers
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
GPU based sparse grid technique for solving multidimensional options pricing PDEs

Proceedings of the 2nd Workshop on High Performance Computational Finance
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

International Journal of Computational Science and Engineering
MLD2P4: A Package of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Fortran 95

ACM Transactions on Mathematical Software (TOMS)
A parallel multigrid Poisson solver for fluids simulation on large grids

Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
A parallel algebraic multigrid solver on graphics processing units

HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing
Interactive smoke simulation and rendering on the GPU

Proceedings of the 12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry
Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.08

Visualization

Abstract

This paper explores the coupling of coarse and fine-grained parallelism for Finite Element (FE) simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requiring no changes to application code. Because of their excellent price performance ratio, we demonstrate the viability of our approach by using commodity Graphics Processing Units (GPUs), addressing the issue of limited precision on GPUs by applying a mixed precision, iterative refinement technique. Our results show that we do not compromise any software functionality and gain speedups of two and more for large problems.