A Work-Optimal Deterministic Algorithm for the Certified Write-All Problem with a Nontrivial Number of Asynchronous Processors

  • Authors:
  • Grzegorz Malewicz

  • Affiliations:
  • -

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Martel [C. Martel, A. Park, and R. Subramonian, SIAM J. Comput., 21 (1992), pp. 1070--1099] posed a question for developing a work-optimal deterministic asynchronous algorithm for the fundamental load-balancing and synchronization problem called Certified Write-All (CWA). In this problem, introduced in a slightly different form by Kanellakis and Shvartsman in a PODC'89 paper [P. C. Kanellakis and A. A. Shvartsman, Distributed Computing, 5 (1992), pp. 201--247], p processors must update n memory cells and only then signal the completion of the updates. It is known that solutions to this problem can be used to simulate synchronous parallel programs on asynchronous systems with worst-case guarantees for the overhead of a simulation. Such simulations are interesting because they may increase productivity in parallel computing since synchronous parallel programs are easier to reason about than are asynchronous ones. This paper presents the first solution to the question of Martel, Park, and Subramonian. Specifically, we show a deterministic asynchronous algorithm for the CWA problem. Our algorithm has the work complexity of O(n+p4log n). This work complexity is asymptotically optimal for a nontrivial number of processors $p \leq \left(n/\log n\right)^{1/4}$. In contrast, all known deterministic algorithms require superlinear in n work when p = n1/r for any fixed $r \geq 1$. Our algorithm generalizes the collision principle introduced by Buss et al. [J. Buss, P. C. Kanellakis, P. L. Ragde, and A. A. Shvartsman, J. Algorithms, 20 (1996), pp. 45--86] in 1996, which has not been previously generalized despite various attempts. Each processor maintains a collection of intervals of {1,2,...,n}. Any processor iteratively selects an interval and works from its tip toward the other tip until it finishes the work or collides with another processor. Collisions are detected effectively using a special Read-Modify-Write operation. In any case, the processor transforms its collection appropriately. Our analysis shows that the transformations preserve some structural properties of collections of intervals. This guarantees that work is assigned to processors in an efficient manner.