Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters

  • Authors:
  • Rinku Gupta;Vinod Tipparaju;Jarek Nieplocha;Dhabaleswar Panda

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most high performance scientific applications require efficient support for collective communication. Point-to-point message-passing communication in current generation clusters are based on Send/Recv communication model. Collective communication operations built on top of such point-to-point message-passing operations might achieve suboptimal performance. VIA and the emerging InfiniBand architecture support remote DMA operations, which allow data to be moved between the nodes with low overhead, they also allow to create and provide a logical shared memory address space across the nodes. In this paper, we focus on barrier, one of the frequently-used collective operations. We demonstrate how RDMA write operations can be used to support inter-node barrier in a cluster with SMP nodes. Combining this with a scheme to exploit shared memory within a SMP node, we develop a fast barrier algorithm for cluster of SMP nodes with cLAN VIA inteconnect. Compared to the current barrier algorithms using Send/Recv communication model, the new approach is shown to reduce barrier latency on a 64 processor (32 dual nodes) system by up to 66%. These results demonstrate that high performance and scalable barrier implementations can be delivered on current and next generation VIA/Infiniband-based clusters with RDMA support.