Dynamic parallel access to replicated content in the internet
IEEE/ACM Transactions on Networking (TON)
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Scheduling Algorithms for Efficient Gather Operations in Distributed Heterogeneous Systems
ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
Efficient Collective Communication in Distributed Heterogeneous Systems
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Broadcast Trees for Heterogeneous Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Two algorithms of irregular scatter/gather operations for heterogeneous platforms
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Pipelined broadcast on ethernet switched clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Collective Receiver-Initiated Multicast for Grid Applications
IEEE Transactions on Parallel and Distributed Systems
Implementation and usage of the PERUSE-Interface in open MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Collective communication in high-performance computing is traditionally implemented as a sequence of point-to-point communication operations. For example, in MPI a broadcast is often implemented using a linear or binomial tree algorithm. These algorithms are inherently unaware of any underlying network heterogeneity. Integrating topology awareness into the algorithms is the traditional way to address this heterogeneity, and it has been demonstrated to greatly optimize tree-based collectives. However, recent research in distributed computing shows that in highly heterogeneous networks an alternative class of collective algorithms - BitTorrent-based multicasts - has the potential to outperform topology-aware tree-based collective algorithms. In this work, we experimentally compare the performance of BitTorrent and tree-based large-message broadcast algorithms in a typical heterogeneous computational cluster. We address the following question: Can the dynamic data exchange in BitTorrent be faster than the static data distribution via trees even in the context of high-performance computing? We find that both classes of algorithms have a justification of use for different settings. While on single switch clusters linear tree algorithms are optimal, once multiple switches and a bottleneck link are introduced, BitTorrent broadcasts --- which utilize the network in a more adaptive way --- outperform the tree-based MPI implementations.