Checkpoint Processing in Distributed Systems Software Using Synchronized Clocks

  • Authors:
  • Affiliations:
  • Venue:
  • ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: The method of taking checkpoints in a truly distributed manner, that is, in the absence of a global checkpoint coordinator has been very tricky. This has been dealt with here in a system that uses loosely synchronized clock. The constituent processes take their checkpoints according to their own clocks at predetermined checkpoint instants. Since these checkpoints are asynchronous, so to determine a global consistent set of such checkpoints there must be some sort of synchronization among them. Synchronization information is appended to clock synchronization messages that are used by the constituent processes for checkpoint-synchronization. Communication in this system is synchronous, so, processes may be blocked for communication at the checkpointing instants. The blocked processes take their checkpoints after they unblock. It is shown here that the set of such i-th checkpoints is consistent and hence the rollback required by the system in case failure occurs is only up to the last saved state.