A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Direct numerical simulation of turbulence with a PC/linux cluster: fact or fiction?
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Communication patterns and models in prism: a spectral element-Fourier parallel Navier-Stokes solver
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High-performacne parallel implicit CFD
Parallel Computing - Special issue on parallel computing in aerospace
Coastal ocean modeling of the U.S. west coast with multiblock grid and dual-level parallelism
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Scaling irregular parallel codes with minimal programming effort
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Terascale spectral element dynamical core for atmospheric general circulation models
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP
International Journal of High Performance Computing Applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Multi-level parallelism for incompressible flow computations on GPU clusters
Parallel Computing
Hi-index | 0.00 |
A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP and employ a workload-splitting scheme that can reduce the OpenMP synchronizations to the minimum. The hybrid implementation shows good scalability with respect to both the problem size and the number of processors in case of a fixed problem size. With the same number of processors, the hybrid model with 2 (or 4) OpenMP threads per MPI process is observed to perform better than pure MPI and pure OpenMP on the NCSA SGI Origin 2000, while the pure MPI model performs the best on the IBM SP3 at SDSC and on the Compaq Alpha cluster at PSC. A key new result is that the use of threads facilitates effectively p-refinement, which is crucial to adaptive discretization using high-order methods.