On the impossibility of group membership
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
A new look at membership services (extended abstract)
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
Proceedings of the seventeenth ACM symposium on Operating systems principles
A Highly Available Local Leader Election Service
IEEE Transactions on Software Engineering
Future Generation Computer Systems - Special issue on metacomputing
Spatial gossip and resource location protocols
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Distributed Operating Systems and Algorithms
Distributed Operating Systems and Algorithms
Scalable application layer multicast
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Peer-to-Peer Membership Management for Gossip-Based Protocols
IEEE Transactions on Computers
A Hierarchical Membership Protocol for Synchronous Distributed Systems
EDCC-1 Proceedings of the First European Dependable Computing Conference on Dependable Computing
Cluster Load Balancing for Fine-Grain Network Services
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Ninja: A Framework for Network Services
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Grid Information Services for Distributed Resource Sharing
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Enforcing Perfect Failure Detection
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Failure Detection and Membership Management in Grid Environments
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Neptune: scalable replication management and programming support for cluster-based network services
USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Using fault injection and modeling to evaluate the performability of cluster-based services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
A gossip-style failure detection service
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
Hi-index | 0.00 |
A highly available large-scale service cluster often requires the system to discover new nodes and identify failed nodes quickly in order to handle a high volume of traffic. Determining node membership promptly in such an environment is critical to location-transparent service invocation, load balancing, and failure shielding. In this paper, we present a topology-adaptive hierarchical membership service which dynamically divides the entire cluster into membership groups based on the network topology among nodes so that the liveness of a node within each group is published to others in a highly efficient manner. The proposed approach has been compared with two alternatives: an allto-all multicast approach and a gossip based approach. The results show that the proposed approach is scalable and effective in terms of high membership accuracy, short view convergence time, and low communication cost.