Two algorithms for barrier synchronization
International Journal of Parallel Programming
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Petascale computing with accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
We report our work on evaluating performance of several MPI Allgather algorithms on Fast Ethernet. These algorithms are ring, recursive doubling, Bruck, and neighbor exchange. The first three algorithms are widely used today. The neighbor exchange algorithm which was recently proposed by the authors incorporates pair-wise exchange, and is expected to perform better with certain configurations, mainly when using TCP/IP over Ethernet. We tested the four algorithms on terascale Linux clusters DeepComp 6800 and DAWNING 4000A using TCP/IP over Fast Ethernet. Results show that our neighbor exchange algorithm performs the best for long messages, the ring algorithm performs the best for medium-size messages and the recursive doubling algorithm performs the best for short messages.