Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Fast barrier synchronization for InfiniBand™
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LogfP - a model for small messages in InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
Large-scale parallel applications performing global synchronization may spend a significant amount of execution time waiting for the completion of a barrier operation. Consequently, numerous research works have focused on reducing the communication costs of synchronization primitives. However, so far there has been no exhaustive comparison of barrier algorithms. This paper will investigate significant representatives of this family of algorithms and evaluate their diverging characteristics, with the purpose of assessing their properties within the context of a specific scenario. The first part of this work will introduce four run time complexity classes, to which all barrier algorithms are known to belong. Then, the LogP model will be used to analyze the behavior and predict the running time of a representative algorithm of each class. As these performance predictions will be scrutinized with the help of measurements conducted on original implementations based on the Open MPI framework, this work will show how to leverage the flexible component architecture of this new MPI implementation, which has proved to be an ideal research tool.