A methodology to design arbitrary failure detectors for distributed protocols

  • Authors:
  • Roberto Baldoni;Jean-Michel Hélary;Sara Tucci Piergiovanni

  • Affiliations:
  • Dipartimento di Informatica e Sistemistica "Antonio Ruberti", Universitá di Roma La Sapienza, Via Ariosto 25, Roma, Italy;IRISA, Campus de Beaulieu, 35042 Rennes-Cedex, France;Dipartimento di Informatica e Sistemistica "Antonio Ruberti", Universitá di Roma La Sapienza, Via Ariosto 25, Roma, Italy

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, there are many protocols able to cope with process crashes, but, unfortunately, a process crash represents only a particular faulty behavior. Handling tougher failures (e.g. sending omission failures, receive omission failures, arbitrary failures) is a real practical challenge due to malicious attacks or unexpected software errors. This is usually achieved either by changing, in an ad hoc manner, the code of a crash resilient protocol or by devising a new protocol from scratch. This paper proposes an alternative methodology to detect processes experiencing arbitrary failures. On this basis, it introduces the notions of liveness failure detector and safety failure detector as two independent software components. With this approach, the nature of failures experienced by processes becomes transparent to the protocol using the components. This methodology brings a few advantages: it makes possible to increase the resilience of a protocol designed in a crash failure context without changing its code by concentrating only on the design of a few well-specified components, and second, it clearly separates the task of designing the protocol from the task of detecting faulty processes, a methodological improvement. Finally, the feasibility of this approach is shown, by providing an implementation of liveness failure detectors and of safety failure detectors for two protocols: one solving the consensus, and the second solving the problem of global data computation.