The Analysis and Optimization of Collective Communications on a Beowulf Cluster

  • Authors:
  • Affiliations:
  • Venue:
  • ICPADS '02 Proceedings of the 9th International Conference on Parallel and Distributed Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper gives a performance analysis of the All-Gather,All-Reduce and Reduce-Scatter collective communicationoperations on a Beowulf cluster. This cluster hasa contention-free switch-based network with multiple networkinterface cards per node, permitting overlapping ofmessage transmission under certain circumstances. As wellas considering traditional algorithms developed previouslyfor parallel computers with vendor-specific networks, wealso examine simpler algorithms made up of repeated sub-operations,such as broadcasts. We find that for the kind ofnetwork on the Beowulf cluster, a somewhat different performancemodelling of the algorithms is required, and thatsome simple simulation tools had to be developed in orderto fully understand some of the algorithms' performance.Our results indicate that the LAM MPI implementationsfor these operations may be significantly improved,and the algorithms with data exchange and potential contentionperform well on the cluster. Furthermore, they indicatethat algorithms permitting message overlap are slightlyfavoured, with a new and simple algorithm which modestlyout-performs the best traditional algorithms in the case ofReduce-Scatter. With the exception that the degree of over-lappingproved difficult to estimate, our performance modelsfitted closely with the results, and together with the simulationtools, permit a detailed understanding of the cluster'scommunication pattern performance.