Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Kernel-based offload of collective operations: implementation, evaluation and lessons learned
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Design and Implementation of Portable and Efficient Non-blocking Collective Communication
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.00 |
Overlapping computation and communication, not only point-to-point but also collective communications, is an important technique to improve the performance of parallel programs. Since the current non-blocking collective communications have been mostly implemented using an extra thread to progress communication, they have extra overhead due to thread scheduling and context switching. In this paper, a new non- blocking communication facility, called KACC is proposed to provide fast asynchronous collective communications. KACC is implemented in the OS kernel interrupt context to perform non-blocking asynchronous collective operations without an extra thread. The experimental results show that the CPU time cost of this method is sufficiently small.