Process cooperation in multiple message broadcast

Authors:
Bin Jia
Affiliations:
IBM Systems & Technology Group, 2455 South Road, Poughkeepsie, NY 12601, USA
Venue:
Parallel Computing
Year:
2009

Citing 6
Cited 1

Optimum Broadcasting and Personalized Communication in Hypercubes

IEEE Transactions on Computers
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Pipelining and Overlapping for MPI Collective Operations

LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
On optimizing collective communication

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
Pipelined broadcast on ethernet switched clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an optimal algorithm for broadcasting m messages from one process to n-1 other processes in a one-port fully connected communication model, where m=1,n1. In this algorithm, the processes are organized into 2^@?^l^o^g^n^@? cooperation units, each consisting of one or two processes. Messages are broadcast among the units following a basic schedule. Processes in each two-process unit cooperate to carry out the basic schedule. At any communication round, either process has at most one message that the other has not received. This algorithm completes the broadcast operation in m+@?logn@?-1 communication rounds, which is theoretically optimal. We consider practical issues for efficient implementation of the algorithm and develop a schedule construction that has both time and space complexity of O(logn). Empirical study shows that this algorithm outperforms other widely used algorithms significantly when the data to broadcast is large.