Efficient synchronization primitives for large-scale cache-coherent multiprocessors

  • Authors:
  • James R. Goodman;Mary K. Vernon;Philip J. Woest

  • Affiliations:
  • Univ. of Wisconsin-Madison, Madison;Univ. of Wisconsin-Madison, Madison;Univ. of Wisconsin-Madison, Madison

  • Venue:
  • ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
  • Year:
  • 1989

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the inter-connect, and (in one case) that the interconnect supports broadcast.The primitives make use of synchronization bits (syncbits) to provide a simple mechanism for mutual exclusion. The proposed implementation of the primitives includes efficient (i.e. local) busy-waiting for syncbits. In addition, a hardware-supported mechanism for maintaining a first-come first-serve queue of requests for a syncbit is proposed. This queueing mechanism allows for a very efficient implementation of, as well as fair access to, binary semaphores. We also propose to implement Fetch and Add with combining in software rather than hardware. This allows an architecture to scale to a large number of processors while avoiding the cost of hardware combining.Scenarios for common synchronization events such as work queues and barriers are presented to demonstrate the generality and ease of use of the proposed primitives. The efficient implementation of the primitives is simpler if the multiprocessor has a hardware cache-consistency protocol. To illustrate this point, we outline how the primitives would be implemented in the Multicube multiprocessor [GoWo88].