A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Authors:
Christian Feichtinger;Johannes Habich;Harald KöStler;Georg Hager;Ulrich RüDe;Gerhard Wellein
Affiliations:
Chair for System Simulation, University of Erlangen-Nuremberg, Germany;Erlangen Regional Computing Center, University of Erlangen-Nuremberg, Germany;Chair for System Simulation, University of Erlangen-Nuremberg, Germany;Erlangen Regional Computing Center, University of Erlangen-Nuremberg, Germany;Chair for System Simulation, University of Erlangen-Nuremberg, Germany;Erlangen Regional Computing Center, University of Erlangen-Nuremberg, Germany
Venue:
Parallel Computing
Year:
2011

Citing 8
Cited 1

A spiral model of software development and enhancement

ACM SIGSOFT Software Engineering Notes
Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
TeraFLOP computing on a desktop PC with GPUs for 3D CFD

International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
Software Engineering: Principles and Practice

Software Engineering: Principles and Practice
Localized Parallel Algorithm for Bubble Coalescence in Free Surface Lattice-Boltzmann Method

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries

Concurrency and Computation: Practice & Experience
Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores

Parallel Computing
A new approach to the lattice Boltzmann method for graphics processing units

Computers & Mathematics with Applications

A framework for hybrid parallel flow simulations with a trillion cells in complex geometries

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. We address this issue in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. Our multi-GPU implementation uses a block-structured MPI parallelization and is suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail. It is demonstrated that a large fraction of the kernel performance can be sustained for weak scaling on InfiniBand clusters, leading to excellent parallel efficiency. However, in strong scaling scenarios using multiple GPUs is much less efficient than running CPU-only simulations on IBM BG/P and x86-based clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task and hardware configuration. Finally we present weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously, using clusters equipped with varying node configurations.