A gossip-style failure detection service

  • Authors:
  • Robbert van Renesse;Yaron Minsky;Mark Hayden

  • Affiliations:
  • Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY

  • Venue:
  • Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Failure Detection is valuable for system management, replication, load balancing, and other distributed services. To date, Failure Detection Services scale badly in the number of members that are being monitored. This paper describes a new protocol based on gossiping that does scale well and provides timely detection. We analyze the protocol, and then extend it to discover and leverage the underlying network topology for much improved resource utilization. We then combine it with another protocol, based on broadcast, that is used to handle partition failures.