Scalable barrier synchronisation for large-scale shared-memory multiprocessors

  • Authors:
  • Zhen Fang;Lixin Zhang;John B. Carter;Mike Parker

  • Affiliations:
  • School of Computing, University of Utah, Salt Lake City, UT 84112, USA.;IBM Austin Research Lab, 11400 Burnet Rd, MS 904/6C019, Austin, TX 78758., USA.;School of Computing, University of Utah, Salt Lake City, UT 84112, USA.;Cray, Inc., 1050 Lowater Road Chippewa Falls, WI 54729, USA

  • Venue:
  • International Journal of High Performance Computing and Networking
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

Barrier synchronisation is very important in scalable multiprocessors. As network latency rapidly approaches thousands of processor cycles and multiprocessors systems become larger and larger, conventional barrier techniques are failing to keep up with the increasing demand for efficient synchronisation. In this paper, we present a memory controller-based operation that optimises the barrier function of an OpenMP library. The proposed mechanism allows atomic operations on the barrier variable to be executed on the home memory controller and the home memory controller to send fine-grained updates to waiting processors when a barrier variable reaches certain values. On a cycle-accurate execution-driven simulator, experiment results show that the proposed barrier implementation outperforms a conventional LL/SC (Load-Linked/ Store-Conditional) version by 20.8X, a conventional processor-side atomic instruction version by 15.5X, and an active messages version by 13.4X. To the best of our knowledge, the proposed barrier achieves better performance than all other existing non-hardwired implementations, and with an improved programming interface.