Communication performance issues for two cluster computers
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
A Reconfigurable MPI Broadcast Function
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Bandwidth optimal all-reduce algorithms for clusters of workstations
Journal of Parallel and Distributed Computing
OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations
Proceedings of the 23rd international conference on Supercomputing
A proposal of reconfigurable MPI collective communication functions
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
This paper gives a performance analysis of the All-Gather,All-Reduce and Reduce-Scatter collective communicationoperations on a Beowulf cluster. This cluster hasa contention-free switch-based network with multiple networkinterface cards per node, permitting overlapping ofmessage transmission under certain circumstances. As wellas considering traditional algorithms developed previouslyfor parallel computers with vendor-specific networks, wealso examine simpler algorithms made up of repeated sub-operations,such as broadcasts. We find that for the kind ofnetwork on the Beowulf cluster, a somewhat different performancemodelling of the algorithms is required, and thatsome simple simulation tools had to be developed in orderto fully understand some of the algorithms' performance.Our results indicate that the LAM MPI implementationsfor these operations may be significantly improved,and the algorithms with data exchange and potential contentionperform well on the cluster. Furthermore, they indicatethat algorithms permitting message overlap are slightlyfavoured, with a new and simple algorithm which modestlyout-performs the best traditional algorithms in the case ofReduce-Scatter. With the exception that the degree of over-lappingproved difficult to estimate, our performance modelsfitted closely with the results, and together with the simulationtools, permit a detailed understanding of the cluster'scommunication pattern performance.