Waiting algorithms for synchronization in large-scale multiprocessors

  • Authors:
  • Beng-Hong Lim;Anant Agarwal

  • Affiliations:
  • Massachusetts Institute of Technology, Cambridge;Massachusetts Institute of Technology, Cambridge

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a thread first waits by polling a synchronization variable. If the cost of polling reaches a limit Lpoll and further waiting is necessary, the thread is blocked, incurring an additional fixed cost, B. The choice of Lpoll is a critical determinant of the performance of two-phase algorithms. We focus on methods for statically determining Lpoll because the run-time overhead of dynamically determining Lpoll can be comparable to the cost of blocking in large-scale multiprocessor systems with lightweight threads.Our experiments show that always-block (Lpoll = 0) is a good waiting algorithm with performance that is usually close to the best of the algorithms compared. We show that even better performance can be achieved with a static choice of Lpoll based on knowledge of likely wait-time distributions. Motivated by the observation that different synchronization types exhibit different wait-time distributions, we prove that a static choice of Lpoll can yield close to optimal on-line performance against an adversary that is restricted to choosing wait times from a fixed family of probability distributions. This result allows us to make an optimal static choice of Lpoll based on synchronization type. For exponentially distributed wait times, we prove that setting Lpoll = 1n(e-1)B results in a waiting cost that is no more than e/(e-1) times the cost of an optimal off-line algorithm. For uniformly distributed wait times, we prove that setting Lpoll=1/2(square root of 5 -1)B results in a waiting cost that is no more than (square root of 5 + 1)/2 (the golden ratio) times the cost of an optimal off-line algorithm. Experimental measurements of several parallel applications on the Alewife multiprocessor simulator corroborate our theoretical findings.