Algorithms for scalable synchronization on shared-memory multiprocessors

  • Authors:
  • John M. Mellor-Crummey;Michael L. Scott

  • Affiliations:
  • Rice Univ., Houston, TX;Univ. of Rochester, Rochester, NY

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 1991

Quantified Score

Hi-index 0.02

Visualization

Abstract

Busy-wait techniques are heavily used for mutual exclusion andbarrier synchronization in shared-memory parallel programs.Unfortunately, typical implementations of busy-waiting tend to producelarge amounts of memory and interconnect contention, introducingperformance bottlenecks that become markedly more pronounced asapplications scale. We argue that this problem is not fundamental, andthat one can in fact construct busy-wait synchronization algorithms thatinduce no memory or interconnect contention. The key to these algorithmsis for every processor to spin on separatelocally-accessible flag variables,and for some other processor to terminate the spin with a single remotewrite operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue ofallocation in the local portion of physically distributed sharedmemory.We present a new scalable algorithm for spin locks that generates 0(1) remote references per lockacquisition, independent of the number of processors attempting toacquire the lock. Our algorithm provides reasonable latency in theabsence of contention, requires only a constant amount of space perlock, and requires no hardware support other than a swap-with-memoryinstruction. We also present a new scalable barrier algorithm thatgenerates 0(1) remote references perprocessor reaching the barrier, and observe that two previously-knownbarriers can likewise be cast in a form that spins only onlocally-accessible flag variables. None of these barrier algorithmsrequires hardware support beyond the usual atomicity of memory reads andwrites.We compare the performance of our scalable algorithms with othersoftware approaches to busy-wait synchronization on both a SequentSymmetry and a BBN Butterfly. Our principal conclusion is thatcontention due to synchronization need not be a problemin large-scale shared-memory multiprocessors. Theexistence of scalable algorithms greatly weakens the case for costlyspecial-purpose hardware support for synchronization, and provides acase against so-called “dance hall” architectures, in whichshared memory locations are equally far from all processors.—From the Authors' Abstract