Preventing Useless Checkpoints in Distributed Computations

  • Authors:
  • Jean-Michel Helary;Achour Mostefaoui;Michel Raynal

  • Affiliations:
  • -;-;-

  • Venue:
  • SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following important problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design a communication-induced checkpointing protocol that directs processes to take additional local (forced) checkpoints to ensure that no local checkpoint is useless.A general and efficient protocol answering this problem is proposed. It is shown that several existing protocols that solve the same problem are particular instances of it. The design of this general protocol is motivated by the use of communication-induced checkpointing protocols in ``consistent global checkpoint''-based distributed applications. Detection of stable or unstable properties, rollback-recovery, and determination of distributed breakpoints are examples of such applications.