Cloning-Based Checkpoint for Localized Recovery

  • Authors:
  • Zunce Wei;Hon F. Li;Dhrubajyoti Goswami

  • Affiliations:
  • Concordia University;Concordia University;Concordia University

  • Venue:
  • ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the use of process clones towards localizing recovery in large-scale distributed systems. A clone is a virtual recovery process with a limited life, and is useful for decoupling recovery dependencies among checkpoints. A generic Checkpoint Dependency Graph (CDG) model is used to capture the dependency relations among checkpoints. A Non-atomic Group Checkpoint (NGC) protocol is presented. It is proved that the protocol can result in localized recovery involving a single group when clones are employed. To limit recovery spread, the size of a group should be limited. This paper presents a few interesting results in this aspect: (i) there is no embedded protocol for atomic group formation with a bounded group-size (k-bounded protocol); (ii) a k-bounded atomic group checkpoint protocol requires at least m-1 explicit messages for checkpoint synchronization in a system consisting of m processes. Lastly, a simple k-bounded atomic group checkpoint protocol is presented and proved.