A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

  • Authors:
  • Roberto Baldoni

  • Affiliations:
  • -

  • Venue:
  • FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

Considering an application in which processes take local checkpoints independently (called basic checkpoints), this paper develops a protocol that forces them to take some additional local checkpoints (called forced checkpoints) in order that the resulting checkpointing and communication pattern satisfies the Rollback Dependency Trackability (RDT) property. This property states that all dependencies between local checkpoints are on-line trackable by using a transitive dependency vector. Compared to other protocols ensuring the RDT property, the proposed protocol is less conservative in the sense that it takes less additional local checkpoints. It attains this goal by a subtle tracking of causal dependencies on already taken checkpoints; this tracking is then used to prevent the occurrence of hidden dependencies. As indicated by simulation study, the proposed protocol compares favorably with other protocols; moreover, it additionally associates on-the-fly with each local checkpoint C the minimum global checkpoint to which C belongs.