Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
A Bandwidth Latency Tradeoff for Broadcast and Reduction
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Pipelining and Overlapping for MPI Collective Operations
LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Pipelined broadcast on ethernet switched clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimal multiple message broadcasting in telephone-like communication systems
SPDP '94 Proceedings of the 1994 6th IEEE Symposium on Parallel and Distributed Processing
Optimal broadcast for fully connected networks
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
Two-tree algorithms for full bandwidth broadcast, reduction and scan
Parallel Computing
Hi-index | 0.00 |
We present a process cooperation algorithm for broadcasting m messages among n processes, m ≥ 1, n ≥ 1, in one-port fully-connected communication systems. In this algorithm, the n processes are organized into 2⌊log n⌋ one- or two-process units. Messages are broadcast among the units according to a basic communication schedule. Processes in each two-process unit cooperate to carry out the basic schedule in a way that at any step, either process has at most one message that the other has not received. This algorithm completes the broadcast in ⌈log n⌉+m-1 communication steps, which is theoretically optimal. Empirical study shows that it outperforms other widely used algorithms significantly when the data to broadcast is large. Efficient communication schedule construction is a salient feature of this algorithm. Both the basic schedule and the cooperation schedule are constructed in O(log n) bitwise operations on process ranking.