Efficient parallel algorithms can be made robust
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Efficient robust parallel computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Combining tentative and definite executions for very fast dependable parallel computing
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Achieving optimal CRCW PRAM fault-tolerance
Information Processing Letters
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Performing work efficiently in the presence of faults
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Work-optimal asynchronous algorithms for shared memory parallel computers
SIAM Journal on Computing
On the complexity of certified write-all algorithms
Journal of Algorithms
Time-optimal message-efficient work performance in the presence of faults
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Parallel algorithms with processor failures and delays
Journal of Algorithms
Algorithms for the Certified Write-All Problem
SIAM Journal on Computing
Fault-tolerant broadcasts and related problems
Distributed systems (2nd Ed.)
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
SETI@HOME—massively distributed computing for SETI
Computing in Science and Engineering
Fault-Tolerant Parallel Computation
Fault-Tolerant Parallel Computation
Distributed Cooperation During the Absence of Communication
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
The Complexity of Synchronous Iterative Do-All with Crashes
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Optimal F-Reliable Protocols for the Do-All Problem on Single-Hop Wireless Networks
ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Resolving message complexity of Byzantine Agreement and beyond
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Parallel processing on networks of workstations: a fault-tolerant, high performance approach
ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Cooperative computing with fragmentable and mergeable groups
Journal of Discrete Algorithms
An algorithm for the asynchronous Write-All problem based on process collision
Distributed Computing
Performing tasks on synchronous restartable message-passing processors
Distributed Computing
Efficient gossip and robust distributed computation
Theoretical Computer Science
The Do-All problem with Byzantine processor failures
Theoretical Computer Science - Foundations of software science and computation structures
A robust randomized algorithm to perform independent tasks
Journal of Discrete Algorithms
Performing dynamically injected tasks on processes prone to crashes and restarts
DISC'11 Proceedings of the 25th international conference on Distributed computing
Fully-adaptive algorithms for long-lived renaming
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Hi-index | 0.00 |
The ability to cooperate on common tasks in a distributed setting is key to solving a broad range of computation problems ranging from distributed search such as SETI to distributed simulation and multi-agent collaboration. Do-All, an abstraction of such cooperative activity, is the problem of performing N tasks in a distributed system of P failure-prone processors. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating Do-All algorithms. The efficiency of the solutions for Do-All is measured in terms of work complexity where all processing steps taken by all processors are counted. Work is ideally expressed as a function of N, P, and f, the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on f. We present the first non-trivial lower bounds for Do-All that capture the dependence of work on N, P and f. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the r-iterative Do-All problem that abstracts facts the repeated use of Do-All such as found in typical algorithm simulations. Our f-sensitive analysis enables us to derive tight bounds for r-iterative Do-All work (that are stronger than the r-fold work complexity of a single Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work under "free" load-balancing, and (ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. We give an improved analysis of an efficient message-passing algorithm. We also derive a tight and complete analysis of the best known Do-All algorithm for the synchronous shared-memory model. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors.