Optimal Choice of Checkpointing Interval for High Availability

  • Authors:
  • Diana Szentivanyi;Simin Nadjm-Tehrani;John M. Noble

  • Affiliations:
  • Linkoping University, Sweden;Linkoping University, Sweden;Linkoping University, Sweden

  • Venue:
  • PRDC '05 Proceedings of the 11th Pacific Rim International Symposium on Dependable Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supporting high availability by checkpointing and switching to a backup upon failure of a primary has a cost. Trade-off studies help system architects to decide whether higher availability at the cost of higher response time is to strive for. The decision will lead to configuring a faulttolerant server for best performance. This paper provides a mathematical model employing queuing theory that helps to compute the optimal checkpointing interval for a primarybackup replicated server. The optimization criterion is system availability. The model guides towards the checkpointing interval that is short enough to give low failover time, but long enough to utilize most of the system resources for servicing client requests. The novelty of the work is the detailed modelling of service times, wait times for earlier calls in the queue, and priority of checkpointing calls over client calls within the queues. Studies on the model in Mathematica and validation of a modelling assumption through simulations are included.