Assessing the performance and scalability of a novel multilevel k-nomial allgather on CORE-Direct systems

Authors:
Joshua S. Ladd;Manjunath Gorentla Venkata;Richard Graham;Pavel Shamis
Affiliations:
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN;Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN;Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN;Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Year:
2012

Citing 10
Cited 0

Methods and problems of communication in usual networks

Proceedings of the international workshop on Broadcasting and gossiping 1990
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
The Hierarchical Factor Algorithm for All-to-All Communication (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
COMB: A Portable Benchmark Suite for Assessing MPI Overlap

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-Based Clusters

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Performance Evaluation of Allgather Algorithms On Terascale Linux Cluster with Fast Ethernet

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Cheetah: A Framework for Scalable Hierarchical Collective Operations

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Efficient allgather for regular SMP-Clusters

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
High performance RDMA based all-to-all broadcast for infiniband clusters

HiPC'05 Proceedings of the 12th international conference on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel allgather algorithm, Reindexed Recursive K-ing (RRK), which leverages flexibility in the algorithm's tree topology and ability to make asynchronous progress coupled with Core-Direct communication offload capability to optimize the MPI_Allgather for Core-Direct enabled systems. In particular, the RRK introduces a reindexing scheme which ensures contiguous data transfers while adding only a single additional send and receive operation for any radix, k, or communicator size, N. This allows us to improve algorithm scalability by avoiding the use of a scatter/gather elements (SGE) list on InfiniBand networks. The implementations of the RRK algorithm and its evaluation shows that it performs and scales well on Core-Direct systems for a wide range of message sizes and various communicator configurations.