Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
All-to-All Broadcast on Switch-Based Clusters of Workstations
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
All-to-All Communication on Meshes with Wormhole Routing
Proceedings of the 8th International Symposium on Parallel Processing
Efficient All-to-All Broadcast in All-Port Mesh and Torus Networks
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Efficient All-to-All Broadcast Schemes in Distributed-Memory Parallel Computers
HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Scaling All-to-All Multicast on Fat-tree Networks
ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
An empirical study of reliable multicast protocols over Ethernet-connected networks
Performance Evaluation
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters
IEEE Transactions on Parallel and Distributed Systems
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 2
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Bandwidth optimal all-reduce algorithms for clusters of workstations
Journal of Parallel and Distributed Computing
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Contention-free many-to-many communication scheduling for high performance clusters
ICDCIT'11 Proceedings of the 7th international conference on Distributed computing and internet technology
Hi-index | 0.00 |
Clusters of workstations employ flexible topologies: regular, irregular, and hierarchical topologies have been used in such systems. The flexibility poses challenges for developing efficient collective communication algorithms since the network topology can potentially have a strong impact on the communication performance. In this paper, we consider the all-to-all broadcast operation on clusters with cut-through and store-and-forward switches. We show that near-optimal all-to-all broadcast on a cluster with any topology can be achieved by only using the links in a spanning tree of the topology when the message size is sufficiently large. The result implies that increasing network connectivity beyond the minimum tree connectivity does not improve the performance of the all-to-all broadcast operation when the most efficient topology specific algorithm is used. All-to-all broadcast algorithms that achieve near-optimal performance are developed for clusters with cut-through and clusters with store-and-forward switches. We evaluate the algorithms through experiments and simulations. The empirical results confirm our theoretical finding.