Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

Authors:
B. Bergen;G. Wellein;F. Hülsemann;U. Rüde
Affiliations:
Los Alamos National Laboratory, CCS-2 Continuum Dynamics, Los Alamos, NM, USA;Regionales Rechenzentrum Erlangen, Universität Erlangen, Erlangen, Germany;Departement SINETICS, EDF R&D, Clamart Cedex, France;Friedrich-Alexander-Universität, Erlangen, Germany
Venue:
International Journal of Parallel, Emergent and Distributed Systems
Year:
2007

Citing 6
Cited 1

Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
Memory characteristics of iterative methods

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Multigrid

Multigrid
A Performance and Scalability Analysis of the BlueGene/L Architecture

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Is 1.7 x 10^10 Unknowns the Largest Finite Element System that Can Be Solved Today?

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Fast sparse matrix-vector multiplication for TeraFlop/s computers

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science

A blocking strategy on multicore architectures for dynamically adaptive PDE solvers

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design of the hierarchical hybrid grids (HHG) framework is motivated by the desire to achieve high performance on large-scale, parallel, finite element simulations on super computers. In order to realize this goal, careful analysis of the low-level, computationally intensive algorithms used in implementing the library is necessary. This analysis is primarily concerned with identifying and removing bottlenecks that limit the serial performance of multigrid component algorithms such as smoothing and residual error calculation. To aid in this investigation, two metrics have been developed: the balance metric (BM), and the loads per miss metric (LPMM). Each of these metrics makes assumptions about the interaction of various data structures and algorithms with the underlying memory subsystems and processors of the architectures on which they are implemented. Applying these metrics generates performance predictions that can then be compared to measured results to determine the actual characteristics of an algorithm/data structure on a given platform. This information can then be used to increase performance. In this paper, we first present an overview of the HHG framework. Next, we introduce the details of the two performance metrics. These metrics are then applied to three different data structures used to implement a Gauß-Seidel smoothing algorithm. Performance results and an interpretation of the underlying interactions of the data structures with several relevant supercomputing architectures are given. Finally, we present a brief discussion of some performance results of the HHG framework, followed by some concluding remarks.