Efficient parallel algorithms can be made robust
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Efficient robust parallel computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Performing work efficiently in the presence of faults
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Work-optimal asynchronous algorithms for shared memory parallel computers
SIAM Journal on Computing
Time-optimal message-efficient work performance in the presence of faults
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Parallel algorithms with processor failures and delays
Journal of Algorithms
Fault-tolerant broadcasts and related problems
Distributed systems (2nd Ed.)
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
Fault-Tolerant Parallel Computation
Fault-Tolerant Parallel Computation
WDAG '93 Proceedings of the 7th International Workshop on Distributed Algorithms
Performing Tasks on Restartable Message-Passing Processors
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Resolving message complexity of Byzantine Agreement and beyond
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
The do-all problem in broadcast networks
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Optimal scheduling for disconnected cooperation
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
The Complexity of Synchronous Iterative Do-All with Crashes
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Bounding Work and Communication in Robust Cooperative Computation
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Optimal F-Reliable Protocols for the Do-All Problem on Single-Hop Wireless Networks
ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
distributed cooperation and adversity: complexity trade-offs
PCK50 Proceedings of the Paris C. Kanellakis memorial workshop on Principles of computing & knowledge: Paris C. Kanellakis memorial workshop on the occasion of his 50th birthday
Work-competitive scheduling for cooperative computing with dynamic groups
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Performing work with asynchronous processors: message-delay-sensitive bounds
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Cooperative computing with fragmentable and mergeable groups
Journal of Discrete Algorithms
Randomization helps to perform independent tasks reliably
Random Structures & Algorithms
Task allocation in a multi-server system
Journal of Scheduling
The complexity of synchronous iterative Do-All with crashes
Distributed Computing
The Effect of Different Failure Recovery Procedures on the Distribution of Task Completion Times
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
Performing work with asynchronous processors: message-delay-sensitive bounds
Information and Computation
Efficient gossip and robust distributed computation
Theoretical Computer Science
The Do-All problem with Byzantine processor failures
Theoretical Computer Science - Foundations of software science and computation structures
Dynamic load balancing with group communication
Theoretical Computer Science
A robust randomized algorithm to perform independent tasks
Journal of Discrete Algorithms
Performing work with asynchronous processors: Message-delay-sensitive bounds
Information and Computation
Performing dynamically injected tasks on processes prone to crashes and restarts
DISC'11 Proceedings of the 25th international conference on Distributed computing
Online parallel scheduling of non-uniform tasks: trading failures for energy
FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory
Hi-index | 0.00 |
This work considers the problem of performing t tasks in a distributed system of p fault-prone processors. This problem, called DO-ALL herein, was introduced by Dwork, Halpern and Waarts. The solutions presented here are for the model of computation that abstracts a synchronous message-passing distributed system with processor stop-failures and restarts. We present two new algorithms based on a new aggressive coordination paradigm by which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f p stop-failures and does not allow restarts. Its available processor steps (work) complexity is S = O((t+ p logp/log log p) ċ log f) and its message complexity is M = O(t + plogp/ log logp +fp). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for p = t and large f, it achieves better work complexity. This algorithm is used as the basis for another algorithm that tolerates stop-failures and restarts. This new algorithm is the first solution for the DO-ALL problem that efficiently deals with processor restarts. Its available processor steps is S = O((t + plogp + f. min{log p, logf}), and its message complexity is M = O(t + plogp + fp), where f is the total number of failures.