Numerical computation of internal & external flows: fundamentals of numerical discretization
Numerical computation of internal & external flows: fundamentals of numerical discretization
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Parallel programming with MPI
Dynamic load distributions for adaptive computations on MIMD machines using hybrid genetic algorithms
A Programming Methodology for Dual-Tier Multicomputers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Parallel programming in OpenMP
Parallel programming in OpenMP
A programming model for block-structured scientific calculations on smp clusters
A programming model for block-structured scientific calculations on smp clusters
A finite-difference domain decomposition method using local corrections for the solution of poisson's equation
Overlapping communication and computation by using a hybrid MPI/SMPSs approach
Proceedings of the 24th ACM International Conference on Supercomputing
Hi-index | 0.00 |
Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is typically used for exploiting loop-level parallelism it can also be used to enable coarse grain parallelism, potentially leading to less overhead. We show how coarse grain OpenMP parallelism can also be used to facilitate overlapping MPI communication and computation for stencil-based grid programs such as a program performing Gauss-Seidel iteration with red-black ordering. Spatial subdivision or domain decomposition is used to assign a portion of the grid to each thread. One thread is assigned a null calculation region so it was free to perform communication. Example calculations were run on an IBM SP using both the Kuck & Associates and IBM compilers.