Journal of Parallel and Distributed Computing
Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Towards an Accurate Model for Collective Communications
International Journal of High Performance Computing Applications
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
This paper shows an effective all-reduction algorithm and its implementation on the Message Passing Interface. It performs comparatively stable in case not only composite numbers of processors but also prime numbers, since we introduce the process detachment strategy on each factorizing stage. On a preliminary test, we examine its efficiency, and we discuss and compare it with the existing algorithms by introducing a performance model of our algorithm.