A Work-Optimal Deterministic Algorithm for the Certified Write-All Problem with a Nontrivial Number of Asynchronous Processors

Authors:
Grzegorz Malewicz
Affiliations:
-
Venue:
SIAM Journal on Computing
Year:
2005

Citing 0
Cited 5

A tight analysis and near-optimal instances of the algorithm of Anderson and Woll

Theoretical Computer Science
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
At-most-once semantics in asynchronous shared memory

DISC'09 Proceedings of the 23rd international conference on Distributed computing
Solving the at-most-once problem with nearly optimal effectiveness

ICDCN'12 Proceedings of the 13th international conference on Distributed Computing and Networking
The strong at-most-once problem

DISC'12 Proceedings of the 26th international conference on Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Martel [C. Martel, A. Park, and R. Subramonian, SIAM J. Comput., 21 (1992), pp. 1070--1099] posed a question for developing a work-optimal deterministic asynchronous algorithm for the fundamental load-balancing and synchronization problem called Certified Write-All (CWA). In this problem, introduced in a slightly different form by Kanellakis and Shvartsman in a PODC'89 paper [P. C. Kanellakis and A. A. Shvartsman, Distributed Computing, 5 (1992), pp. 201--247], p processors must update n memory cells and only then signal the completion of the updates. It is known that solutions to this problem can be used to simulate synchronous parallel programs on asynchronous systems with worst-case guarantees for the overhead of a simulation. Such simulations are interesting because they may increase productivity in parallel computing since synchronous parallel programs are easier to reason about than are asynchronous ones. This paper presents the first solution to the question of Martel, Park, and Subramonian. Specifically, we show a deterministic asynchronous algorithm for the CWA problem. Our algorithm has the work complexity of O(n+p4log n). This work complexity is asymptotically optimal for a nontrivial number of processors $p \leq \left(n/\log n\right)^{1/4}$. In contrast, all known deterministic algorithms require superlinear in n work when p = n1/r for any fixed $r \geq 1$. Our algorithm generalizes the collision principle introduced by Buss et al. [J. Buss, P. C. Kanellakis, P. L. Ragde, and A. A. Shvartsman, J. Algorithms, 20 (1996), pp. 45--86] in 1996, which has not been previously generalized despite various attempts. Each processor maintains a collection of intervals of {1,2,...,n}. Any processor iteratively selects an interval and works from its tip toward the other tip until it finishes the work or collides with another processor. Collisions are detected effectively using a special Read-Modify-Write operation. In any case, the processor transforms its collection appropriately. Our analysis shows that the transformations preserve some structural properties of collections of intervals. This guarantees that work is assigned to processors in an efficient manner.