MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Network performance-aware collective communication for clustered wide-area systems
Parallel Computing - Clusters and computational grids for scientific computing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters
The Journal of Supercomputing
Improving MPI applications performance on multicore clusters with rank reordering
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
The impact of system design parameters on application noise sensitivity
Cluster Computing
NUMA-aware shared-memory collective communication for MPI
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.01 |
Most parallel systems on which MPI is used are now hierarchical, such as systems with SMP nodes. Many papers have shown algorithms that exploit shared memory to optimize collective operations to good effect. But how much of the performance benefit comes from tailoring the algorithm to the hierarchical topology of the system? We describe an implementation of many of the MPI collectives based entirely on message-passing primitives that exploits the two-level hierarchy. Our results show that exploiting shared memory directly usually gives small additional benefit and suggests design approaches for where the benefit is large.