Using computing checkpoints implement consistent low-cost non-blocking coordinated checkpointing

  • Authors:
  • Chaoguang Men;Xiaozong Yang

  • Affiliations:
  • School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, P.R.China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, P.R.China

  • Venue:
  • PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two approaches are used to reduce the overhead associated with coordinated checkpointing:one is to reduce the number of synchronization messages and the number of checkpoints;the other is to make the checkpointing process non-blocking.In this paper, we introduce the concept of “computing checkpoint” to design an efficient consistent non-blocking coordinated checkpointing algorithm that combines these two approaches.Through piggybacking the information that which processes have taken new checkpoints in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes,so that the unnecessary checkpoints and orphan messages can be avoided in the future running.The algorithm needn’t block any process and has lower overhead than other proposed consistent coordinated checkpointing algorithms.