The complexity of synchronous iterative Do-All with crashes

Authors:
Chryssis Georgiou;Alexander Russell;Alex A. Shvartsman
Affiliations:
Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Rd., Unit 1155, Storrs, CT;Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Rd., Unit 1155, Storrs, CT;Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Rd., Unit 1155, Storrs, CT and Laboratory for Computer Science, Massachusetts Institute of Technology, 200 ...
Venue:
Distributed Computing
Year:
2004

Citing 24
Cited 5

Efficient parallel algorithms can be made robust

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Efficient robust parallel computations

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Combining tentative and definite executions for very fast dependable parallel computing

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Achieving optimal CRCW PRAM fault-tolerance

Information Processing Letters
Efficient program transformations for resilient parallel computation via randomization (preliminary version)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Performing work efficiently in the presence of faults

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Work-optimal asynchronous algorithms for shared memory parallel computers

SIAM Journal on Computing
On the complexity of certified write-all algorithms

Journal of Algorithms
Time-optimal message-efficient work performance in the presence of faults

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Parallel algorithms with processor failures and delays

Journal of Algorithms
Algorithms for the Certified Write-All Problem

SIAM Journal on Computing
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
SETI@HOME—massively distributed computing for SETI

Computing in Science and Engineering
Fault-Tolerant Parallel Computation

Fault-Tolerant Parallel Computation
Distributed Cooperation During the Absence of Communication

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
The Complexity of Synchronous Iterative Do-All with Crashes

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Optimal F-Reliable Protocols for the Do-All Problem on Single-Hop Wireless Networks

ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Resolving message complexity of Byzantine Agreement and beyond

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Parallel processing on networks of workstations: a fault-tolerant, high performance approach

ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Cooperative computing with fragmentable and mergeable groups

Journal of Discrete Algorithms
An algorithm for the asynchronous Write-All problem based on process collision

Distributed Computing
Performing tasks on synchronous restartable message-passing processors

Distributed Computing

Efficient gossip and robust distributed computation

Theoretical Computer Science
The Do-All problem with Byzantine processor failures

Theoretical Computer Science - Foundations of software science and computation structures
A robust randomized algorithm to perform independent tasks

Journal of Discrete Algorithms
Performing dynamically injected tasks on processes prone to crashes and restarts

DISC'11 Proceedings of the 25th international conference on Distributed computing
Fully-adaptive algorithms for long-lived renaming

DISC'06 Proceedings of the 20th international conference on Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to cooperate on common tasks in a distributed setting is key to solving a broad range of computation problems ranging from distributed search such as SETI to distributed simulation and multi-agent collaboration. Do-All, an abstraction of such cooperative activity, is the problem of performing N tasks in a distributed system of P failure-prone processors. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating Do-All algorithms. The efficiency of the solutions for Do-All is measured in terms of work complexity where all processing steps taken by all processors are counted. Work is ideally expressed as a function of N, P, and f, the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on f. We present the first non-trivial lower bounds for Do-All that capture the dependence of work on N, P and f. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the r-iterative Do-All problem that abstracts facts the repeated use of Do-All such as found in typical algorithm simulations. Our f-sensitive analysis enables us to derive tight bounds for r-iterative Do-All work (that are stronger than the r-fold work complexity of a single Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work under "free" load-balancing, and (ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. We give an improved analysis of an efficient message-passing algorithm. We also derive a tight and complete analysis of the best known Do-All algorithm for the synchronous shared-memory model. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors.