Parallel efficiency can be greater than unity
Parallel Computing
GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
The finite element method on a data parallel computing system
International Journal of High Speed Computing
Computer Methods in Applied Mechanics and Engineering
Computer Methods in Applied Mechanics and Engineering
Parallel 3D computation of unsteady flows around circular cylinders
Parallel Computing - Special issue on applications: parallel computing methods in applied fluid mechanics
Parallel Computing - Special issue on applications: parallel computing methods in applied fluid mechanics
Large-scale finite element fluid analysis by massively parallel processors
Parallel Computing - Special issue on applications: parallel computing methods in applied fluid mechanics
Parallel performance of two application in the Boeing high performance computing benchmark suite
Parallel Computing - Special issue on parallel computing in aerospace
Computer Systems Design and Architecture
Computer Systems Design and Architecture
A parallel 3D unsteady incompressible flow solver on VPP700
Parallel Computing
A new parallel finite element algorithm for the stationary Navier-Stokes equations
Finite Elements in Analysis and Design
FastMat: A C++ library for multi-index array computations
Advances in Engineering Software
Hi-index | 0.00 |
A stabilized finite element formulation for three-dimensional unsteady incompressible flows is implemented on a distributed memory parallel computer. A matrix-free version of the GMRES algorithm is utilized to solve the equation systems in an implicit manner. The scalability of the computations on a 64-processor Linux cluster is evaluated for moderate to large size problems. A method for estimating the speedup for large-scale problems, where computations on a single processor is not possible, is proposed. Superlinear speedup is observed, perhaps for the first time, for a large-scale problem that is associated with more than 44 million nodes and 176 million equations. The performance of the various subactivities of the program is monitored to investigate the cause. It is found that the formation of the RHS vector and the preconditioner achieves a very high level of superlinear speedup as the number of processors increase. As a result, even though the network time for interprocessor communication increases with increase in processors, an overall superlinear speedup is realized for large-scale problems. The superlinear speedup is attributed to cache related effects. A comparison between the performance of matrix and matrix-free versions of the GMRES algorithm is carried out. It is found that for large-scale applications the matrix-free version outperforms its counterpart for reasonable dimensions of the Kyrylov subspace. The effect of mesh partitioning on the scalability is also studied. A significant reduction in communication time is observed with partitioning that leads to an overall improvement of speedup. The parallel implementation is utilized to study the wake instabilities in flow past a stationary circular cylinder at Re=150, 200 and 300. The Re=150 flow is found to be two-dimensional while mode-A and mode-B instabilities are observed at Re=200 and 300, respectively. The Re=300 flow is associated with a low frequency modulation in addition to the vortex shedding frequency.