A parallel ghosting algorithm for the flexible distributed mesh database

  • Authors:
  • Misbah Mubarak;Seegyoung Seol;Qiukai Lu;Mark S. Shephard

  • Affiliations:
  • Scientific Computation Research Center, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA. E-mails: {mubarm, seols, luq3, shephard}@rpi.edu;Scientific Computation Research Center, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA. E-mails: {mubarm, seols, luq3, shephard}@rpi.edu;Scientific Computation Research Center, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA. E-mails: {mubarm, seols, luq3, shephard}@rpi.edu;Scientific Computation Research Center, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA. E-mails: {mubarm, seols, luq3, shephard}@rpi.edu

  • Venue:
  • Scientific Programming
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: 1 It can create ghost copies of any permissible topological order in a 1D, 2D or 3D mesh based on selected adjacencies. 2 It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. 3 For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.