Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters

Authors:
Rinku Gupta;Vinod Tipparaju;Jarek Nieplocha;Dhabaleswar Panda
Affiliations:
-;-;-;-
Venue:
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Year:
2002

Citing 0
Cited 4

Fast synchronization on shared-memory multiprocessors: An architectural approach

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Fast barrier synchronization for InfiniBand™

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A high performance superpipeline protocol for infiniband

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Hardware support for OpenMP collective operations

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most high performance scientific applications require efficient support for collective communication. Point-to-point message-passing communication in current generation clusters are based on Send/Recv communication model. Collective communication operations built on top of such point-to-point message-passing operations might achieve suboptimal performance. VIA and the emerging InfiniBand architecture support remote DMA operations, which allow data to be moved between the nodes with low overhead, they also allow to create and provide a logical shared memory address space across the nodes. In this paper, we focus on barrier, one of the frequently-used collective operations. We demonstrate how RDMA write operations can be used to support inter-node barrier in a cluster with SMP nodes. Combining this with a scheme to exploit shared memory within a SMP node, we develop a fast barrier algorithm for cluster of SMP nodes with cLAN VIA inteconnect. Compared to the current barrier algorithms using Send/Recv communication model, the new approach is shown to reduce barrier latency on a 64 processor (32 dual nodes) system by up to 66%. These results demonstrate that high performance and scalable barrier implementations can be delivered on current and next generation VIA/Infiniband-based clusters with RDMA support.