Two algorithms for barrier synchronization
International Journal of Parallel Programming
Hardware- and Software-Based Collective Communication on the Quadrics Network
NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
The Gemini System Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Cheetah: A Framework for Scalable Hierarchical Collective Operations
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Hi-index | 0.00 |
Barrier is a collective operation used by many scientific applications and parallel libraries for synchronization. Typically, a Barrier operation is implemented by exchanging a short data message that requires demultiplexing, thereby adding undesired latency to the operation. In this work, we reduce the latency of Barrier operations for Cray XE/XK systems by leveraging the atomic operations provided by the Gemini interconnect, tailoring algorithms to utilize these capabilities, and utilizing a hierarchical design to arrive at an efficient implementation. Our micro-benchmark evaluation shows that for a 4,096 process Barrier operation, the atomic-operations-based Barrier outperforms the data exchange Barrier by 52% and the native Barrier by 111%.