A necessary and sufficient condition for transforming limited accuracy failure detectors

  • Authors:
  • E. Anceaume;A. Fernández;A. Mostefaoui;G. Neiger;M. Raynal

  • Affiliations:
  • IRISA, Campus de Beaulieu, Université de Rennes 1, 35042 Rennes Cedex, France;Universidad Rey Juan Carlos, 28933 Móstoles, Madrid, Spain;IRISA, Campus de Beaulieu, Université de Rennes 1, 35042 Rennes Cedex, France;Intel Corporation, JF3-332, 2111 NE 25th Avenue, Hillsboro, OR;IRISA, Campus de Beaulieu, Université de Rennes 1, 35042 Rennes Cedex, France

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unreliable failure detectors are oracles that give information about process failures. Chandra and Toueg were first to study such failure detectors for distributed systems, and they identified a number that enabled the solution of the Consensus problem in asynchronous distributed systems. This paper focuses on two of these, denoted J (strong) and ♦J (eventually strong). The characteristics of a given unreliable failure detector are usually described by its completeness and accuracy properties. Completeness is a requirement on the actual detection of failures, while accuracy limits the mistakes a failure detector can make. Let the scope of the accuracy property of an unreliable failure detector be the minimum number (k) of processes that may not erroneously suspect a correct process to have crashed. Usual failure detectors implicitly consider a scope equal to n (the total number of processes). Accuracy properties with limited scope give rise to the classes of failure detectors that we call Jk and ♦Jk. This paper investigates the following question: "Given Jk and ♦Jk, under which condition is it possible to transform their failure detectors into their counterparts with unlimited accuracy, i.e., AP and J ♦J?". The paper answers this question in the following way. It first presents a particularly simple protocol that realizes such a transformation when f k (where f is the maximum number of processes that may crash). Then, it shows that there is no reduction protocol when f ≥ k.