A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems

Authors:
Jesper Larsson Träff;Andreas Ripke;Christian Siebert;Pavan Balaji;Rajeev Thakur;William Gropp
Affiliations:
NEC Laboratories Europe, NEC Europe Ltd., Sankt Augustin, Germany D-53757;NEC Laboratories Europe, NEC Europe Ltd., Sankt Augustin, Germany D-53757;NEC Laboratories Europe, NEC Europe Ltd., Sankt Augustin, Germany D-53757;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA IL 60439;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA IL 60439;Department of Computer Science, University of Illinois, Urbana, USA IL 61801
Venue:
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2008

Citing 7
Cited 1

Gossiping in minimal time

SIAM Journal on Computing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Reproducible Measurements of MPI Performance Characteristics

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Efficient allgather for regular SMP-Clusters

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Efficient shared memory and RDMA based design for MPI_Allgather over infiniband

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Self--consistent MPI performance requirements

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Delta Send-Recv for Dynamic Pipelining in MPI Programs

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present and evaluate a new, simple, pipelined algorithm for large, irregularall-gather problems, useful for the implementation of the MPI_Allgathervcollective operation of MPI. The algorithm can be viewed as an adaptation of a linear ring algorithm for regular all-gather problems for single-ported, clustered multiprocessors to the irregular problem. Compared to the standard ring algorithm, whose performance is dominated by the largest data size broadcast by a process (times the number of processes), the performance of the new algorithm depends only on the total amount of data over all processes. The new algorithm has been implemented within different MPI libraries. Benchmark results on NEC SX-8, Linux clusters with InfiniBand and Gigabit Ethernet, Blue Gene/P, and SiCortex systems show huge performance gains in accordance with the expected behavior.