Efficient parallel algorithms can be made robust
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Combining tentative and definite executions for very fast dependable parallel computing
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Performing work efficiently in the presence of faults
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Work-optimal asynchronous algorithms for shared memory parallel computers
SIAM Journal on Computing
On the complexity of certified write-all algorithms
Journal of Algorithms
Time-optimal message-efficient work performance in the presence of faults
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Parallel algorithms with processor failures and delays
Journal of Algorithms
Algorithms for the Certified Write-All Problem
SIAM Journal on Computing
Building agent teams using an explicit teamwork model and learning
Artificial Intelligence - Special issue on Robocop: the first step
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
SETI@HOME—massively distributed computing for SETI
Computing in Science and Engineering
Towards practical deteministic write-all algorithms
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Fault-Tolerant Parallel Computation
Fault-Tolerant Parallel Computation
Distributed Cooperation During the Absence of Communication
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Bounding Work and Communication in Robust Cooperative Computation
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Work-competitive scheduling for cooperative computing with dynamic groups
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Small-World Topology for Multi-Agent Collaboration
DEXA '00 Proceedings of the 11th International Workshop on Database and Expert Systems Applications
Resolving message complexity of Byzantine Agreement and beyond
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Parallel processing on networks of workstations: a fault-tolerant, high performance approach
ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Cooperative computing with fragmentable and mergeable groups
Journal of Discrete Algorithms
The complexity of synchronous iterative Do-All with crashes
Distributed Computing
An algorithm for the asynchronous Write-All problem based on process collision
Distributed Computing
Performing tasks on synchronous restartable message-passing processors
Distributed Computing
A robust randomized algorithm to perform independent tasks
Journal of Discrete Algorithms
Reliably executing tasks in the presence of malicious processors
DISC'05 Proceedings of the 19th international conference on Distributed Computing
Hi-index | 0.00 |
Do-All is the abstract problem of using n processors to cooperatively perform m independent tasks in the presence of failures. This problem and its derivatives have been a centerpiece in the study of trade-offs between efficiency and fault-tolerance in cooperative computing environments. Many algorithms have been developed for Do-All in various models of computation, including message-passing, partitionable networks, and shared-memory models under a variety of failure models.This work initiates the study of the Do-All problem for synchronous message-passing processors prone to Byzantine failures. In particular, upper and lower bounds are given on the complexity of Do-All for several cases: (a) the case where the maximum number of faulty processors f is known a priori, (b) the case where f is not known, (c) the case where a task execution can be verified (without re-executing the task), and (d) the case where task executions cannot be verified. The efficiency of algorithms is evaluated in terms of work and message complexities. The work complexity accounts for all computational steps taken by the processors and the message complexity accounts for all messages sent by the processors during the computation. The work and messages of a faulty processor are counted only until the processor fails to follow the algorithm. It is shown that in some cases obtaining work Θ(mn) is the best one can do. It is also shown that in certain cases communication cannot help improve work efficiency.