A novel shared-memory thread-pool implementation for hybrid parallel CFD solvers

  • Authors:
  • Jens Jägersküpper;Christian Simmendinger

  • Affiliations:
  • German Aerospace Center, Institute of Aerodynamics and Flow Technology, Center of Computer Applications in Aerospace Science and Engineering, Germany;T-Systems Solution for Research, Stuttgart, Germany

  • Venue:
  • Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Computational Fluid Dynamics (CFD) solver TAU for unstructured grids is widely used in the European aerospace industry. TAU runs on High-Performance Computing (HPC) clusters with several thousands of cores using MPI-based domain decomposition. In order to make more efficient use of current multi-core CPUs and to prepare TAU for the many-core era, a shared-memory parallelization has been added to one of TAU's solver to obtain a hybrid parallelization: MPI-based domain decomposition plus multi-threaded processing of a domain. For the edge-based solver considered, a simple loop-based approach via OpenMP FOR directives would - due to the Amdahl trap - not deliver the required speed-up. A more sophisticated, thread-pool-based sharedmemory parallelization has been developed which allows for a relaxed thread synchronization with automatic and dynamic load balancing. In this paper we describe the concept behind this shared-memory parallelization, we explain how the multi-threaded computation of a domain works. Some details of its implementation in TAU as well as some first performance results are presented. We emphasize that the concept is not TAU-specific. Actually, this design pattern appears to be very generic and may well be applied to other grid/mesh/graph-based codes.