Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Terascale spectral element dynamical core for atmospheric general circulation models
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Benchmark Design for Characterization of Balanced High-Performance Architectures
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Effective Communication and File-I/O Bandwidth Benchmarks
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Fast sparse matrix-vector multiplication for TeraFlop/s computers
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Hi-index | 0.00 |
Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node inter-connect with the shared memory parallelization inside of each node. The hybrid MPI+OpenMP programming model is compared with pure MPI and compiler based parallelization. The paper focuses on bandwidth and latency aspects, but also whether programming paradigms can separate the optimization of communication and computation. Benchmark results are presented for hybrid and pure MPI communication.