Efficient implementation of reduce-scatter in MPI

  • Authors:
  • Massimo Bernaschi;Giulio Iannello;Mario Lauria

  • Affiliations:
  • Istituto Applicazioni del Calcolo, CNR, Viale del Policlinico 137, I-00161 Rome, Italy;Dipartimento di Informatica e Sistemistica, Università di Napoli, v. Claudio, 21-80125 Napoli, Italy;Department of Computer and Information Science, The Ohio State University, 2015 Neil Ave, Columbus OH

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

We discuss the efficient implementation of a collective operation called reduce-scatter, which is defined in the MPI standard. The reduce-scatter is equivalent to the combination of a reduction on vectors of length n with a scatter of the resulting n-vector to all processors.We describe the implementation issues and the performance characterization of two recently proposed algorithms for the reduce-scatter that have been proven to be highly efficient in theory under the assumption of fully connected parallel system.A performance comparison with existing mainstream implementations of the operation is presented which confirms the practical advantage of the new algorithms. Experiments show that the two algorithms have different characteristics which make them complementary in providing a performance gain over standard algorithms.Our study has been carried out on two different platforms: an SP2 and a Myrinet interconnected cluster of Pentium PRO. However, most of the results reported here are not specific for either MPI or the platforms used, and they hold in general for any message passing programming system.