A decomposition approach for optimizing the performance of MPI libraries

  • Authors:
  • Olaf Hartmann;Matthias Kühnemann;Thomas Rauber;Gudula Rünger

  • Affiliations:
  • Chemnitz University of Technology, Department of Computer Science, Chemnitz, Germany;Chemnitz University of Technology, Department of Computer Science, Chemnitz, Germany;University Bayreuth, Department of Computer Science, Bayreuth, Germany;Chemnitz University of Technology, Department of Computer Science, Chemnitz, Germany

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

MPI provides a portable message passing interface for many parallel execution platforms but may lead to inefficiencies for some platforms and applications. In this article we show that the performance of both, standard libraries and vendor-specific libraries, can be improved by an orthogonal organization of the processors in 2D or 3D meshes and by decomposing the collective communication operations into several phases. We describe an adaptive approach with a configuration phase to determine for a specific execution platform and a specific MPI library which decomposition leads to the best performance. This may also depend on the number of processors and the size of the messages to be transferred. The decomposition approach has been implemented in the form of a library extension which is called for each activation of a collective MPI operation. This has the advantage that neither the application programs nor the MPI library need to be changed while leading to significant performance improvements for many collective MPI operations.