A Gossip-Style Failure Detection Service

Authors:
Robbert Van Renesse;Yaron Minsky;Mark Hayden
Affiliations:
-;-;-
Venue:
A Gossip-Style Failure Detection Service
Year:
1998

Citing 0
Cited 25

Moshe: A group membership service for WANs

ACM Transactions on Computer Systems (TOCS)
Probabilistic Reliable Dissemination in Large-Scale Systems

IEEE Transactions on Parallel and Distributed Systems
A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
The feasibility of supporting large-scale live streaming applications with dynamic application end-points

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Détection de partition pour la gestion de groupes en environnement mobile

UbiMob '05 Proceedings of the 2nd French-speaking conference on Mobility and ubiquity computing
Scalable information dissemination for pervasive systems: implementation and evaluation

Proceedings of the 4th international workshop on Middleware for Pervasive and Ad-Hoc Computing (MPAC 2006)
Evaluation of the QoS of crash-recovery failure detection

Proceedings of the 2007 ACM symposium on Applied computing
Early experience with an internet broadcast system based on overlay multicast

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
BAR gossip

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Compositional gossip: a conceptual architecture for designing gossip-based applications

ACM SIGOPS Operating Systems Review - Gossip-based computer networking
Grouping algorithms for scalable self-monitoring distributed systems

Autonomics '08 Proceedings of the 2nd International Conference on Autonomic Computing and Communication Systems
Failure detectors for wireless sensor-actuator systems

Ad Hoc Networks
VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A gossip-style failure detection service

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Self-healing network for scalable fault-tolerant runtime environments

Future Generation Computer Systems
Network-Friendly Gossiping

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Optimizing information flow in the gossip objects platform

ACM SIGOPS Operating Systems Review
Autonomous and scalable failure detection in distributed systems

International Journal of Autonomous and Adaptive Communications Systems
Gossiping for autonomic estimation of network-based parameters in dynamic environments

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
What model and what conditions to implement unreliable failure detectors in dynamic networks?

Proceedings of the 3rd International Workshop on Theoretical Aspects of Dynamic Distributed Systems
Experimental evaluation of a failure detection service based on a gossip strategy

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Scalable fault tolerant protocol for parallel runtime environments

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
A peer-to-peer framework for robust execution of message passing parallel programs on grids

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Intelligent dependability services for overlay networks

DAIS'06 Proceedings of the 6th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems
Bounded gossip: a gossip protocol for large-scale datacenters

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Failure Detection is valuable for system management, replication, load balancing, and other distributed services. To date, Failure Detection Services scale badly in the number of members that are being monitored. This paper describes a new protocol based on gossiping that does scale well and provides timely detection. We analyze the protocol, and then extend it to discover and leverage the underlying network topology for much improved resource utilization. We then combine it with another protocol, based on broadcast, that is used to handle partition failures.