Hierarchical Collectives in MPICH2

Authors:
Hao Zhu;David Goodell;William Gropp;Rajeev Thakur
Affiliations:
Department of Computer Science, University of Illinois, Urbana, USA 61801;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA 60439;Department of Computer Science, University of Illinois, Urbana, USA 61801;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA 60439
Venue:
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2009

Citing 3
Cited 5

MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Network performance-aware collective communication for clustered wide-area systems

Parallel Computing - Clusters and computational grids for scientific computing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing

Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters

The Journal of Supercomputing
Improving MPI applications performance on multicore clusters with rank reordering

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
The impact of system design parameters on application noise sensitivity

Cluster Computing
NUMA-aware shared-memory collective communication for MPI

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Most parallel systems on which MPI is used are now hierarchical, such as systems with SMP nodes. Many papers have shown algorithms that exploit shared memory to optimize collective operations to good effect. But how much of the performance benefit comes from tailoring the algorithm to the hierarchical topology of the system? We describe an implementation of many of the MPI collectives based entirely on message-passing primitives that exploits the two-level hierarchy. Our results show that exploiting shared memory directly usually gives small additional benefit and suggests design approaches for where the benefit is large.