Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems

  • Authors:
  • P. L'Ecuyer;J. Malenfant

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Computers - Fault-Tolerant Computing
  • Year:
  • 1988

Quantified Score

Hi-index 0.01

Visualization

Abstract

A numerical approach for computing optimal dynamic checkpointing strategies for general rollback and recovery systems is presented. The system is modeled as a Markov renewal decision process. General failure distributions, random checkpointing durations, and reprocessing-dependent recovery times are allowed. The aim is to find a dynamic decision rule to maximize the average system availability over an infinite time horizon. A computational approach to approximate such a rule is proposed. This approach is based on value-iteration stochastic dynamic programming with spline or finite-element approximation of the value and policy functions. Numerical illustrations are provided.