A multi-cycle checkpointing protocol that ensures strict 1-rollback

  • Authors:
  • Yi-Wei Ci;Zhan Zhang;De-Cheng Zuo;Zhi-Bo Wu;Xiao-Zong Yang

  • Affiliations:
  • Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong and Institute of Software, Chinese Academy of Sciences, Beijing, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

  • Venue:
  • Information Processing Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.89

Visualization

Abstract

In this paper, a checkpointing protocol based on loose synchronization is proposed. The protocol enables processes to take checkpoints at different frequencies so that each process can control its rollback distance. In traditional asynchronous and quasi-synchronous checkpointing protocols, the checkpoints that are not up-to-date may be used for recovery. As a result, the rollback distance is often difficult to control. In the proposed protocol, the checkpoint cycle of each process is dynamically adjusted using a pessimistic scheme so that strict 1-rollback is achieved; namely, one of the last two checkpoints of each process can be utilized for recovery.