High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Authors:
Dimitri Komatitsch;Gordon Erlebacher;Dominik Göddeke;David Michéa
Affiliations:
Université de Pau et des Pays de l'Adour, CNRS & INRIA Magique-3D, Laboratoire de Modéélisation et d'Imagerie en Géosciences UMR 5212, Avenue de l'Université, 64013 Pau Ce ...;Department of Scientific Computing, Florida State University, Tallahassee 32306, USA;Institut für Angewandte Mathematik, TU Dortmund, Germany;Université de Pau et des Pays de l'Adour, CNRS & INRIA Magique-3D, Laboratoire de Modéélisation et d'Imagerie en Géosciences UMR 5212, Avenue de l'Université, 64013 Pau Ce ...
Venue:
Journal of Computational Physics
Year:
2010

Citing 38
Cited 15

Large-scale vectorized implicit calculations in solid mechanics on a Cray X-MP/48 utilizing EBE preconditioned conjugate gradients

Computer Methods in Applied Mechanics and Engineering
A general approach to nonlinear FE computations on shared-memory multiprocessors

Computer Methods in Applied Mechanics and Engineering
Spectral element method for acoustic wave simulation in heterogeneous media

Finite Elements in Analysis and Design - Special issue: selection of papers presented at ICOSAHOM'92
Nonlinear dynamic finite element analysis on parallel computers using FORTRAN 90 and MPI

Advances in Engineering Software - Special issue; special issue on large-scale analysis and design on high-performance computers and workstations
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
An analysis of the discontinuous Galerkin method for wave propagation problems

Journal of Computational Physics
A generalized diagonal mass matrix spectral element method for non-quadrilateral elements

Proceedings of the fourth international conference on Spectral and high order methods (ICOSAHOM 1998)
Explicit Finite Element Methods for Symmetric Hyperbolic Equations

SIAM Journal on Numerical Analysis
SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs

HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Solving elastodynamics in a fluid-solid heterogeneous sphere: a parallel spectral element approximation on non-conforming grids

Journal of Computational Physics
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A 14.6 billion degrees of freedom, 5 teraflops, 2.5 terabyte earthquake simulation on the Earth Simulator

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Discontinuous Galerkin Method for Linear Symmetric Hyperbolic Systems in Inhomogeneous Media

Journal of Scientific Computing
A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting: Research Articles

Computer Animation and Virtual Worlds - Special Issue: The Very Best Papers from CASA 2004
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
General purpose molecular dynamics simulations fully implemented on graphics processing units

Journal of Computational Physics
Scalable Parallel Programming with CUDA

Queue - GPU Computing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations

International Journal of Parallel, Emergent and Distributed Systems
Fast multipole methods on graphics processors

Journal of Computational Physics
A closer look at GPUs

Communications of the ACM
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
Adapting a message-driven parallel application to GPU-accelerated clusters

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel Computing Experiences with CUDA

IEEE Micro
Accelerating linpack with CUDA on heterogenous clusters

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
3D finite difference computation on GPUs using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA

Journal of Parallel and Distributed Computing
Message passing on data-parallel architectures

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Efficient and Accurate Sound Propagation Using Adaptive Rectangular Decomposition

IEEE Transactions on Visualization and Computer Graphics
Nodal discontinuous Galerkin methods on graphics processors

Journal of Computational Physics
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

International Journal of Computational Science and Engineering
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low- and high-order discretisations

Journal of Computational Physics
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
CUDASA: compute unified device and systems architecture

EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization

GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Journal of Computational Physics
FTI: high performance fault tolerance interface for hybrid systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FLAT: a GPU programming framework to provide embedded MPI

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Simulation of multistage excavation based on a 3D spectral-element method

Computers and Structures
Performance evaluation of a Multi-GPU enabled finite element method for computational electromagnetics

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

Journal of Parallel and Distributed Computing
FastMat: A C++ library for multi-index array computations

Advances in Engineering Software
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

The Journal of Supercomputing
Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity

Journal of Computational Physics
Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster

Journal of Computational Physics
GPU parallelization of a three dimensional marine CSEM code

Computers & Geosciences
A GPU parallelized spectral method for elliptic equations in rectangular domains

Journal of Computational Physics
Population-based harmony search using GPU applied to protein structure prediction

International Journal of Computational Science and Engineering
Numerical integration on GPUs for higher order finite elements

Computers & Mathematics with Applications

Quantified Score

Hi-index	31.47

Visualization

Abstract

We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x.