Global combine on mesh architectures with wormhole routing

Authors:
Barnett; Littlefield; Payne; van de Geijn
Affiliations:
Dept. of Comput. Sci., Idaho Univ., Moscow, ID, USA;-;-;-
Venue:
IPPS '93 Proceedings of the 1993 Seventh International Parallel Processing Symposium
Year:
1993

Citing 0
Cited 2

Faster topology-aware collective algorithms through non-minimal communication

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Performance analysis and optimization of MPI collective operations on multi-core clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several algorithms are discussed for implementing global combine (summation) on distributed memory computers using a two-dimensional mesh interconnect with wormhole routing. These include algorithms that are asymptotically optimal for short vectors (O(log(p)) for p processing nodes) and for long vectors (O(n) for n data elements per node), as well as hybrid algorithms that are superior for intermediate n. Performance models are developed that include the effects of link conflicts and other characteristics of the underlying communication system. The models are validated using experimental data from the Intel Touchstone DELTA computer. Each of the combine algorithms is shown to be superior under some circumstances.