Optimal time randomized consensus—making resilient algorithms fast in practice
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Time-optimal message-efficient work performance in the presence of faults
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Totem: a fault-tolerant multicast group communication system
Communications of the ACM
The Transis approach to high availability cluster communication
Communications of the ACM
Horus: a flexible group communication system
Communications of the ACM
Dynamic voting for consistent primary components
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A dynamic view-oriented group communication service
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Performing Work Efficiently in the Presence of Faults
SIAM Journal on Computing
Specifying and using a partitionable group communication service
ACM Transactions on Computer Systems (TOCS)
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
Distributed Algorithms
Fault-Tolerant Parallel Computation
Fault-Tolerant Parallel Computation
Reliable Distributed Computing with the ISIS Toolkit
Reliable Distributed Computing with the ISIS Toolkit
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
The Bancomat Problem: An Example of Resource Allocation in a Partitionable Asynchronous System
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Distributed Cooperation During the Absence of Communication
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Resolving message complexity of Byzantine Agreement and beyond
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Newtop: a fault-tolerant group communication protocol
ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
The ensemble system
Performing tasks on synchronous restartable message-passing processors
Distributed Computing
Optimally work-competitive scheduling for cooperative computing with merging groups
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Work-competitive scheduling for cooperative computing with dynamic groups
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
The complexity of synchronous iterative Do-All with crashes
Distributed Computing
The Do-All problem with Byzantine processor failures
Theoretical Computer Science - Foundations of software science and computation structures
Dynamic load balancing with group communication
Theoretical Computer Science
Emulating shared-memory Do-All algorithms in asynchronous message-passing systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This work considers the problem of performing a set of N tasks on a set of P cooperating message-passing processors (P ≤ N). The processors use a group communication service (GCS) to coordinate their activity in the setting where dynamic changes in the underlying network topology cause the processor groups to change over time. GCSs have been recognized as effective building blocks for fault-tolerant applications in such settings. Our results explore the efficiency of fault-tolerant cooperative computation using GCSs. The original investigation of this area by (Dolev et al., Dynamic load balancing with group communication, in: Proc. of the 6th International Colloquium on Structural Information and Communication Complexity, 1999) focused on competitive lower bounds, non-redundant task allocation schemes and work-efficient algorithms in the presence of fragmentation regroupings. In this work we investigate work-efficient and message-efficient algorithms for fragmentation and merge regroupings. We present an algorithm that uses GCSs and implements a coordinator-based strategy. For the analysis of our algorithm we introduce the notion of view-graphs that represent the partially-ordered view evolution history witnessed by the processors. For fragmentations and merges, the work of the algorithm (defined as the worst case total number of task executions counting multiplicities) is not more than min{N ċ f + N, N ċ P}, and the message complexity is no worse than 4(N ċ f + N + P ċ m), where f and m denote the number of new groups created by fragmentations and merges, respectively. Note that the constants are very small and that, interestingly, while the work efficiency depends on the number of groups f created as the result of fragmentations, work does not depend on the number of groups m created as the result of merges.