Probabilistic Checkpointing

  • Authors:
  • Hyo-chang Nam;Jong Kim;SungJe Hong;Sunggu Lee

  • Affiliations:
  • -;-;-;-

  • Venue:
  • FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many optimization schemes have been proposed to reduce the overhead of checkpointing. Incremental checkpointing based on memory page protection has been one of the successful schemes used to reduce the overhead and to improve the performance of checkpointing. In this paper, we propose two checkpointing schemes, called "block encoding" and "combined block encoding", which further reduce the checkpointing overhead. The smallest unit of checkpoint data in our scheme is a block, which is smaller than a page --- this reduces the amount of checkpoint data required when compared with page-based incremental checkpointing.One drawback of the proposed schemes is the possibility of aliasing in encoded words. In this paper, however, we show that the aliasing probability is near zero when an 8-byte encoded word is used. The performance of the proposed schemes is analyzed and measured using experiments. First, we construct an analytic model that predicts the checkpointing overhead. By using this model, we can estimate the block size that produces the best performance for a given target program. Next, the proposed schemes are implemented on libckpt, a general-purpose checkpointing library for Unix based system which was developed at the University of Tennessee. According to our experimental results, the proposed schemes reduce the overhead by 11.7% in the best case and increase the overhead by 0.5% in the worst case in comparison with page-based incremental checkpointing. In most cases, the combined block encoding scheme shows an improvement over both block encoding and page-based incremental checkpointing.