Application-Bypas Broadcast in MPICH over GM

  • Authors:
  • Darius Buntinas;Dhabaleswar K. Panda;Ron Brightwell

  • Affiliations:
  • -;-;-

  • Venue:
  • CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Processes of a parallel program can become unsynchronized,or skewed, during the course of running anapplication. Processes can become skewed as a resultof unbalanced or asymmetric code, or through the useof heterogeneous systems, where nodes in the systemhave different performance characteristics, as well asrandom, unpredictable effects such as the processes notbeing started at exactly the same time, or processors receiving interrupts during computation. Geographicallydistributed systems may have more severe skew becauseof variable communicatio times. Such skew can have asignificant impact on the performance of collective communication operations which impose a implicit synchronization. The broadcast operation in MPICH is onesuch operation. An application-bypass broadcast operation is one which does not depend on the applicationrunning at a process to make progress. Such a operation would not be as sensitive to process skew. Thispaper describes the design and implementation of aapplication-bypass broadcast operation. We evaluatedthe implementatio and find a factor of improvement ofup to 16 for application-bypass broadcast compared tonon-application-bypass broadcast when processes areskewed. Furthermore we see that as the system size increases, the effects of skew on non-application-bypassbroadcast also increase. The application-bypass broadcast is much less sensitive to process skew which makesit more scalable than the non-application-bypass broadcast operation.