Cooperative computing with fragmentable and mergeable groups

Authors:
Chryssis Georgiou;Alex A. Shvartsman
Affiliations:
Computer Science and Engineering, University of Connecticut, Storrs, CT;Computer Science and Engineering, University of Connecticut, Storrs, CT and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
Journal of Discrete Algorithms
Year:
2003

Citing 21
Cited 6

Optimal time randomized consensus—making resilient algorithms fast in practice

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Time-optimal message-efficient work performance in the presence of faults

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Totem: a fault-tolerant multicast group communication system

Communications of the ACM
The Transis approach to high availability cluster communication

Communications of the ACM
Horus: a flexible group communication system

Communications of the ACM
Dynamic voting for consistent primary components

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A dynamic view-oriented group communication service

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Performing Work Efficiently in the Presence of Faults

SIAM Journal on Computing
Specifying and using a partitionable group communication service

ACM Transactions on Computer Systems (TOCS)
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Distributed Algorithms

Distributed Algorithms
Fault-Tolerant Parallel Computation

Fault-Tolerant Parallel Computation
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
The Bancomat Problem: An Example of Resource Allocation in a Partitionable Asynchronous System

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Distributed Cooperation During the Absence of Communication

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Resolving message complexity of Byzantine Agreement and beyond

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Newtop: a fault-tolerant group communication protocol

ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
The ensemble system

The ensemble system
Performing tasks on synchronous restartable message-passing processors

Distributed Computing

Optimally work-competitive scheduling for cooperative computing with merging groups

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Work-competitive scheduling for cooperative computing with dynamic groups

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
The complexity of synchronous iterative Do-All with crashes

Distributed Computing
The Do-All problem with Byzantine processor failures

Theoretical Computer Science - Foundations of software science and computation structures
Dynamic load balancing with group communication

Theoretical Computer Science
Emulating shared-memory Do-All algorithms in asynchronous message-passing systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work considers the problem of performing a set of N tasks on a set of P cooperating message-passing processors (P ≤ N). The processors use a group communication service (GCS) to coordinate their activity in the setting where dynamic changes in the underlying network topology cause the processor groups to change over time. GCSs have been recognized as effective building blocks for fault-tolerant applications in such settings. Our results explore the efficiency of fault-tolerant cooperative computation using GCSs. The original investigation of this area by (Dolev et al., Dynamic load balancing with group communication, in: Proc. of the 6th International Colloquium on Structural Information and Communication Complexity, 1999) focused on competitive lower bounds, non-redundant task allocation schemes and work-efficient algorithms in the presence of fragmentation regroupings. In this work we investigate work-efficient and message-efficient algorithms for fragmentation and merge regroupings. We present an algorithm that uses GCSs and implements a coordinator-based strategy. For the analysis of our algorithm we introduce the notion of view-graphs that represent the partially-ordered view evolution history witnessed by the processors. For fragmentations and merges, the work of the algorithm (defined as the worst case total number of task executions counting multiplicities) is not more than min{N ċ f + N, N ċ P}, and the message complexity is no worse than 4(N ċ f + N + P ċ m), where f and m denote the number of new groups created by fragmentations and merges, respectively. Note that the constants are very small and that, interestingly, while the work efficiency depends on the number of groups f created as the result of fragmentations, work does not depend on the number of groups m created as the result of merges.