Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Using process groups to implement failure detection in asynchronous environments
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The Transis approach to high availability cluster communication
Communications of the ACM
Efficient message ordering in dynamic networks
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
On the impossibility of group membership
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
The Totem multiple-ring ordering and topology maintenance protocol
ACM Transactions on Computer Systems (TOCS)
A Configurable Membership Service
IEEE Transactions on Computers
Middleware support for distributed multimedia and collaborative computing
Software—Practice & Experience
Specifying and using a partitionable group communication service
ACM Transactions on Computer Systems (TOCS)
Group Communication in Partitionable Systems: Specification and Algorithms
IEEE Transactions on Software Engineering
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
Structured virtual synchrony: exploring the bounds of virtual synchronous group communication
EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Building Secure and Reliable Network Applications
Building Secure and Reliable Network Applications
Delta Four: A Generic Architecture for Dependable Distributed Computing
Delta Four: A Generic Architecture for Dependable Distributed Computing
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
The Bancomat Problem: An Example of Resource Allocation in a Partitionable Asynchronous System
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Consensus: The Big Misunderstanding
FTDCS '97 Proceedings of the 6th IEEE Workshop on Future Trends of Distributed Computing Systems
A transparent light-weight group service
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
Fast Replicated State Machines Over Partitionable Networks
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
A Client-Server Oriented Algorithm for Virtually Synchronous Group Membership in WANs
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
A Client-Server Approach to Virtually Synchronous Group Multicast: Specifications and Algorithms
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Design and Performance of Horus: A Lightweight Group Communications System
Design and Performance of Horus: A Lightweight Group Communications System
Optimizing Layered Communication Protocols
Optimizing Layered Communication Protocols
A Gossip-Style Failure Detection Service
A Gossip-Style Failure Detection Service
Fault Tolerant Video on Demand Services
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Evaluating the running time of a communication round over the internet
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Group Membership and Wide-Area Master-Worker Computations
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Group membership: a novel approach and the first single-round algorithm
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Implementing a replicated service with group communication
Journal of Systems Architecture: the EUROMICRO Journal
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
IEEE Transactions on Computers
Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
Fuzzy-grey prediction based dynamic failure detector for distributed systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Challenges in evaluating distributed algorithms
Future directions in distributed computing
Census: location-aware membership management for large-scale distributed systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Distributed group membership algorithm in intrusion-tolerant system
APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
Performance evaluation of group communication architectures in large scale systems using MPI
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
A replication-based fault tolerance protocol using group communication for the grid
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
We present Moshe, a novel scalable group membership algorithm built specifically for use in wide area networks (WANs), which can suffer partitions. Moshe is designed with three new significant features that are important in this setting: it avoids delivering views that reflect out-of-date memberships; it requires a single round of messages in the common case; and it employs a client-server design for scalability. Furthermore, Moshe's interface supplies the hooks needed to provide clients with full virtual synchrony semantics. We have implemented Moshe on top of a network event mechanism also designed specifically for use in a WAN. In addition to specifying the properties of the algorithm and proving that this specification is met, we provide empirical results of an implementation of Moshe running over the Internet. The empirical results justify the assumptions made by our design and exhibit good performance. In particular, Moshe terminates within a single communication round over 98% of the time. The experimental results also lead to interesting observations regarding the performance of membership algorithms over the Internet.