Performance modeling and tuning of an unstructured mesh CFD application

Authors:
William D. Gropp;Dinesh K. Kaushik;David E. Keyes;Barry Smith
Affiliations:
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL;Mathematics & Statistics Department, Old Dominion University, Norfolk, VA, ISCR, Lawrence Livermore National Laboratory, Livermore, CA and ICASE, NASA Langley Research Center, Hampton, VA;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL
Venue:
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Year:
2000

Citing 10
Cited 16

An implicit upwind algorithm for computing turbulent flows on unstructured grids

Computers and Fluids
Implicit/multigrid algorithms for incompressible turbulent flows on unstructured grids

Journal of Computational Physics
Convergence Analysis of Pseudo-Transient Continuation

SIAM Journal on Numerical Analysis
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems

SIAM Journal on Scientific Computing
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP

International Journal of High Performance Computing Applications
Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel Implicit CFD

International Journal of High Performance Computing Applications

Analyzing the Parallel Scalability of an Implicit Unstructured Mesh CFD Code

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Performance modeling of deterministic transport computations

Performance analysis and grid computing
Improving the computational intensity of unstructured mesh applications

Proceedings of the 19th annual international conference on Supercomputing
Phase-aware adaptive hardware selection for power-efficient scientific computations

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Low-constant parallel algorithms for finite element simulations using linear octrees

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A projection-based optimization framework for abstractions with application to the unstructured mesh domain

Proceedings of the 22nd annual international conference on Supercomputing
Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Evaluation of Hierarchical Mesh Reorderings

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Edgepack: a parallel vertex and node reordering package for optimizing edge-based computations in unstructured grids

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Fast sparse matrix-vector multiplication for TeraFlop/s computers

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Efficient Nonlinear Solvers for Nodal High-Order Finite Elements in 3D

Journal of Scientific Computing
On improving performance and energy profiles of sparse scientific applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Conjugate gradient sparse solvers: performance-power characteristics

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes

SIAM Journal on Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several large-scale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E and Beowulf clusters. The code achieves a respectable level ofperformance for sparse problems, typical of scientific and engineering codes based on partial differential equations, and scales well up to thousands of processors. Since the gap between CPU speed and memory access rate is widening, the code is analyzed from a memory-centric perspective (in contrast to traditional flop-orientation) to understand its sequential and parallel performance. Performance tuning is approached on three fronts: data layouts to enhance locality of reference, algorithmic parameters and parallel programming model. This effort was guided partly by some simple performance models developed for the sparse matrix-vector product operation.