A comparison of sender-initiated and receiver-initiated reliable multicast protocols
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient parallel processing on low-cost clusters with GAMMA active ports
Parallel Computing - Parallel computing on clusters of workstations
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Performance Evaluation of Fast Ethernet, Giganet, and Myrinet on a Cluster
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Application-Bypas Broadcast in MPICH over GM
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Exploiting fast ethernet performance in multiplatform cluster environment
Proceedings of the 2004 ACM symposium on Applied computing
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Hi-index | 0.00 |
Clusters are high performance computation systems, built up out of standard off-the-self workstations connected with fast, but standard communication devices. This structure allows higher pure processing power and lower hardware costs compared to other supercomputers. One main disadvantage of clusters is the lower communication throughput between the processing elements, as standard methods usually provide weaker performance than the much more expensive special communication devices of supercomputers. Because of this it is very important to take the most advantage of the existing communication potential in cluster environment. This paper presents a method of enhancing the performance of the broadcast group communication primitive by using a new algorithm that takes advantage of message decomposition and asynchronous communication. When used in fully switched cluster environment the new solution provides a constant execution time independent of the number of participants. Test measurements show that the algorithm follows well the predicted behavior, and has superior performance, compared to the widely used binomial tree method used in standard message passing libraries. As broadcasting is a building block of various group communication primitives, improving its performance may have beneficial effect on several routine of message passing libraries.