Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

  • Authors:
  • B. Bergen;G. Wellein;F. Hülsemann;U. Rüde

  • Affiliations:
  • Los Alamos National Laboratory, CCS-2 Continuum Dynamics, Los Alamos, NM, USA;Regionales Rechenzentrum Erlangen, Universität Erlangen, Erlangen, Germany;Departement SINETICS, EDF R&D, Clamart Cedex, France;Friedrich-Alexander-Universität, Erlangen, Germany

  • Venue:
  • International Journal of Parallel, Emergent and Distributed Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The design of the hierarchical hybrid grids (HHG) framework is motivated by the desire to achieve high performance on large-scale, parallel, finite element simulations on super computers. In order to realize this goal, careful analysis of the low-level, computationally intensive algorithms used in implementing the library is necessary. This analysis is primarily concerned with identifying and removing bottlenecks that limit the serial performance of multigrid component algorithms such as smoothing and residual error calculation. To aid in this investigation, two metrics have been developed: the balance metric (BM), and the loads per miss metric (LPMM). Each of these metrics makes assumptions about the interaction of various data structures and algorithms with the underlying memory subsystems and processors of the architectures on which they are implemented. Applying these metrics generates performance predictions that can then be compared to measured results to determine the actual characteristics of an algorithm/data structure on a given platform. This information can then be used to increase performance. In this paper, we first present an overview of the HHG framework. Next, we introduce the details of the two performance metrics. These metrics are then applied to three different data structures used to implement a Gauß-Seidel smoothing algorithm. Performance results and an interpretation of the underlying interactions of the data structures with several relevant supercomputing architectures are given. Finally, we present a brief discussion of some performance results of the HHG framework, followed by some concluding remarks.