The Totem single-ring ordering and membership protocol
ACM Transactions on Computer Systems (TOCS)
The Transis approach to high availability cluster communication
Communications of the ACM
Horus: a flexible group communication system
Communications of the ACM
On the impossibility of group membership
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Specifying and using a partitionable group communication service
ACM Transactions on Computer Systems (TOCS)
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
Building Secure and Reliable Network Applications
Building Secure and Reliable Network Applications
Moshe: A group membership service for WANs
ACM Transactions on Computer Systems (TOCS)
Evaluating the running time of a communication round over the internet
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Design and Performance of Horus: A Lightweight Group Communications System
Design and Performance of Horus: A Lightweight Group Communications System
Group membership: a novel approach and the first single-round algorithm
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
Hi-index | 0.00 |
Sigma, the first single-round group membership (GM) algorithm, was recently introduced and demonstrated to operate consistently with theoretical expectations in a simulated WAN environment. Sigma achieved similar quality of membership configurations as existing algorithms but required fewer message exchange rounds. We now consider Sigma in terms of scalability. Sigma involves all-to-all (A2A) type of communication among members. A2A protocols have been shown to perform worse than leader-based (LB) protocols in certain networks, due to greater message overhead and higher likelihood of message loss. Thus, although LB protocols often involve additional communication steps, they can be more efficient in practice, particularly in fault-prone networks with large numbers of participating nodes. In this paper, we present Leader-Based Sigma, which transforms the original all-to-all version into a more scalable centralized communication scheme, and discuss the rounds vs. messages tradeoff involved in optimizing GM algorithms for deployment in large-scale, fault-prone dynamic network environments.