Randomization Helps to Perform Tasks on Processors Prone to Failures

Authors:
Bogdan S. Chlebus;Dariusz R. Kowalski
Affiliations:
-;-
Venue:
Proceedings of the 13th International Symposium on Distributed Computing
Year:
1999

Citing 14
Cited 2

Efficient robust parallel computations

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Wait-free parallel algorithms for the union-find problem

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Combining tentative and definite executions for very fast dependable parallel computing

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Efficient program transformations for resilient parallel computation via randomization (preliminary version)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Work-optimal asynchronous algorithms for shared memory parallel computers

SIAM Journal on Computing
On the complexity of certified write-all algorithms

Journal of Algorithms
Time-optimal message-efficient work performance in the presence of faults

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Parallel algorithms with processor failures and delays

Journal of Algorithms
Performing Work Efficiently in the Presence of Faults

SIAM Journal on Computing
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
Fault-Tolerant Parallel Computation

Fault-Tolerant Parallel Computation
Performing Tasks on Restartable Message-Passing Processors

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Resolving message complexity of Byzantine Agreement and beyond

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Efficient parallel algorithms can be made robust

Distributed Computing

Bounding Work and Communication in Robust Cooperative Computation

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Optimal F-Reliable Protocols for the Do-All Problem on Single-Hop Wireless Networks

ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of performing t tasks in a distributed system of p processors is studied. The tasks are assumed to be independent, similar (each takes one stepto be completed), and idempotent (can be performed many times and concurrently). The processors communicate by passing messages and each of them may fail. This problem is usually called do-all, it was introduced by Dwork, Halpern and Waarts. The distributed setting considered in this paper is as follows: The system is synchronous, the processors fail by stopping, reliable multicast is available. The occurrence of faults is modeled by an adversary who has to choose at least c ċ p processors prior to the start of the computation, for a fixed constant 0 c The main result is showing that there is a sharpdi fference between the expected performance of randomized algorithms versus the worst-case deterministic performance of algorithms solving the DO-ALL problem in such a setting. Performance is measured in terms of work and communication of algorithms. Work is the total number of steps performed by all the processors while they are operational, including idling. Communication is the total number of point-to-point messages exchanged. Let effort be the sum of work and communication. A randomized algorithm is developed which has the expected effort O(t + p (1 + log* p - log*(p/t))), where log* is the number of iterations of the log function required to go with the value of function down to 1. For deterministic algorithms and their worst-case behavior, a lower bound Ω(t+p log t/ log log t) on work holds, and it is matched by the work performed by a simple algorithm.