Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Scheduling calls for multicasting in tree-networks
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Near-Optimal Broadcast in All-Port Wormhole-Routed Hypercubes Using Error-Correcting Codes
IEEE Transactions on Parallel and Distributed Systems
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Computer Networks
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Multicast on Irregular Switch-based Networks with Wormhole Routing
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Pipelining Broadcasts on Heterogeneous Platforms
IEEE Transactions on Parallel and Distributed Systems
Broadcast Trees for Heterogeneous Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
IEEE Transactions on Computers
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
A study of process arrival patterns for MPI collective operations
Proceedings of the 21st annual international conference on Supercomputing
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Process cooperation in multiple message broadcast
Parallel Computing
Scheduling for atomic broadcast operation in heterogeneous networks with one port model
The Journal of Supercomputing
Scheduling for atomic broadcast operation in heterogeneous networks with one port model
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Process cooperation in multiple message broadcast
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Hi-index | 0.00 |
We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet switches. By splitting a large broadcast message into segments and broadcasting the segments in a pipelined fashion, pipelined broadcast may achieve very high performance. We develop algorithms for computing various contention-free broadcast trees on Ethernet switched clusters that are suitable for pipelined broad-cast, and evaluate the schemes through experimentation. The conclusions drawn from our theoretical and experimental study include the following. First, pipelined broadcast can be more effective than other common broadcast schemes including the ones used in the latest versions of MPICH and LAM/MPI when the message size is sufficiently large. Second, contention-free broadcast trees are essential for pipelined broadcast to achieve high performance. Finally, while it is difficult to determine the optimal message segment size for pipelined broadcast, finding one size that gives good performance is relatively easy.