Probabilistic Checkpointing

Authors:
Hyo-chang Nam;Jong Kim;SungJe Hong;Sunggu Lee
Affiliations:
-;-;-;-
Venue:
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Year:
1997

Citing 0
Cited 5

A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
libhashckpt: hash-based incremental checkpointing using GPU's

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A secure checkpointing protocol for survivable server design

ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology
Accelerating incremental checkpointing for extreme-scale computing

Future Generation Computer Systems
Surviving sensor node failures by MMU-less incremental checkpointing

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many optimization schemes have been proposed to reduce the overhead of checkpointing. Incremental checkpointing based on memory page protection has been one of the successful schemes used to reduce the overhead and to improve the performance of checkpointing. In this paper, we propose two checkpointing schemes, called "block encoding" and "combined block encoding", which further reduce the checkpointing overhead. The smallest unit of checkpoint data in our scheme is a block, which is smaller than a page --- this reduces the amount of checkpoint data required when compared with page-based incremental checkpointing.One drawback of the proposed schemes is the possibility of aliasing in encoded words. In this paper, however, we show that the aliasing probability is near zero when an 8-byte encoded word is used. The performance of the proposed schemes is analyzed and measured using experiments. First, we construct an analytic model that predicts the checkpointing overhead. By using this model, we can estimate the block size that produces the best performance for a given target program. Next, the proposed schemes are implemented on libckpt, a general-purpose checkpointing library for Unix based system which was developed at the University of Tennessee. According to our experimental results, the proposed schemes reduce the overhead by 11.7% in the best case and increase the overhead by 0.5% in the worst case in comparison with page-based incremental checkpointing. In most cases, the combined block encoding scheme shows an improvement over both block encoding and page-based incremental checkpointing.