A Dynamic Checkpointing Scheme Based on Reinforcement Learning

Authors:
Hiroyuki Okamura;Yuki Nishimura;Tadashi Dohi
Affiliations:
-;-;-
Venue:
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Year:
2004

Citing 0
Cited 5

Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle

IEEE Transactions on Dependable and Secure Computing
Numerical computation algorithms for sequential checkpoint placement

Performance Evaluation
Analysis of a software system with rejuvenation, restoration and checkpointing

ISAS'08 Proceedings of the 5th international conference on Service availability
Comprehensive evaluation of aperiodic checkpointing and rejuvenation schemes in operational software system

Journal of Systems and Software
On-line adaptive algorithms in autonomic restart control

ATC'10 Proceedings of the 7th international conference on Autonomic and trusted computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we develop a new checkpointing scheme for a uniprocess application. First, we model the checkpointing scheme by a semi-Markov decision process, and apply the reinforcement learning algorithm to estimate statistically the optimal checkpointing policy. More specifically, the representative reinforcement learning algorithm, called the Q-learning algorithm, is used to develop an adaptive checkpointing scheme. In simulation experiments, we examine the asymptotic behavior of the system overhead with adaptive checkpointing and show quantitatively that the proposed dynamic checkpoint algorithm is useful and robust under an incomplete knowledge on the failure time distribution.