A novel shared-memory thread-pool implementation for hybrid parallel CFD solvers

Authors:
Jens Jägersküpper;Christian Simmendinger
Affiliations:
German Aerospace Center, Institute of Aerodynamics and Flow Technology, Center of Computer Applications in Aerospace Science and Engineering, Germany;T-Systems Solution for Research, Stuttgart, Germany
Venue:
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Year:
2011

Citing 4
Cited 1

A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Hierarchical Task-Based Programming With StarSs

International Journal of High Performance Computing Applications
Overlapping communication and computation by using a hybrid MPI/SMPSs approach

Proceedings of the 24th ACM International Conference on Supercomputing

Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Computational Fluid Dynamics (CFD) solver TAU for unstructured grids is widely used in the European aerospace industry. TAU runs on High-Performance Computing (HPC) clusters with several thousands of cores using MPI-based domain decomposition. In order to make more efficient use of current multi-core CPUs and to prepare TAU for the many-core era, a shared-memory parallelization has been added to one of TAU's solver to obtain a hybrid parallelization: MPI-based domain decomposition plus multi-threaded processing of a domain. For the edge-based solver considered, a simple loop-based approach via OpenMP FOR directives would - due to the Amdahl trap - not deliver the required speed-up. A more sophisticated, thread-pool-based sharedmemory parallelization has been developed which allows for a relaxed thread synchronization with automatic and dynamic load balancing. In this paper we describe the concept behind this shared-memory parallelization, we explain how the multi-threaded computation of a domain works. Some details of its implementation in TAU as well as some first performance results are presented. We emphasize that the concept is not TAU-specific. Actually, this design pattern appears to be very generic and may well be applied to other grid/mesh/graph-based codes.