On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems

  • Authors:
  • M. Larrea;A. Fernandez;S. Arevalo

  • Affiliations:
  • Dept. de Arquitectura y Tecnologia de Comput., Pais Vasco Univ., San Sebastian, Spain;-;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2004

Quantified Score

Hi-index 14.99

Visualization

Abstract

Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all--to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. In this paper, we study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, \cal P, \cal Q, \cal S, and \cal W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class \Theta proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, \diamond {\cal{P}}, \diamond {\cal{Q}}, \diamond {\cal{S}}, and \diamond {\cal{W}}). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.