A Practical Approach to the Rating of Barrier Algorithms Using the LogP Model and Open MPI

Authors:
Torsten Hoefler;Lavinio Cerquetti;Frank Mietke
Affiliations:
Chemnitz University of Technology;Chemnitz University of Technology;Chemnitz University of Technology
Venue:
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Year:
2005

Citing 0
Cited 5

Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Fast barrier synchronization for InfiniBand™

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LogfP - a model for small messages in InfiniBand

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale parallel applications performing global synchronization may spend a significant amount of execution time waiting for the completion of a barrier operation. Consequently, numerous research works have focused on reducing the communication costs of synchronization primitives. However, so far there has been no exhaustive comparison of barrier algorithms. This paper will investigate significant representatives of this family of algorithms and evaluate their diverging characteristics, with the purpose of assessing their properties within the context of a specific scenario. The first part of this work will introduce four run time complexity classes, to which all barrier algorithms are known to belong. Then, the LogP model will be used to analyze the behavior and predict the running time of a representative algorithm of each class. As these performance predictions will be scrutinized with the help of measurements conducted on original implementations based on the Open MPI framework, this work will show how to leverage the flexible component architecture of this new MPI implementation, which has proved to be an ideal research tool.