Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Pipelining and Overlapping for MPI Collective Operations
LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
Pipelined broadcast on ethernet switched clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Hi-index | 0.00 |
We present an optimal algorithm for broadcasting m messages from one process to n-1 other processes in a one-port fully connected communication model, where m=1,n1. In this algorithm, the processes are organized into 2^@?^l^o^g^n^@? cooperation units, each consisting of one or two processes. Messages are broadcast among the units following a basic schedule. Processes in each two-process unit cooperate to carry out the basic schedule. At any communication round, either process has at most one message that the other has not received. This algorithm completes the broadcast operation in m+@?logn@?-1 communication rounds, which is theoretically optimal. We consider practical issues for efficient implementation of the algorithm and develop a schedule construction that has both time and space complexity of O(logn). Empirical study shows that this algorithm outperforms other widely used algorithms significantly when the data to broadcast is large.