Optimal checkpointing interval of a communication system with rollback recovery

  • Authors:
  • M. Kimura;K. Yasui;T. Nakagawa;N. Ishii

  • Affiliations:
  • Department of International Cultural Studies Gifu City Women's College Gifu, 501-0192, Japan;Department of Industrial Engineering Aichi Institute of Technology Toyota, 470-0392, Japan;Department of Industrial Engineering Aichi Institute of Technology Toyota, 470-0392, Japan;Department of Intelligence and Computer Science Nagoya Institute of Technology Nagoya, 466-8555, Japan

  • Venue:
  • Mathematical and Computer Modelling: An International Journal
  • Year:
  • 2003

Quantified Score

Hi-index 0.98

Visualization

Abstract

This paper considers a communication system which consists of many processors and studies the problem for improving its reliability by adopting the recovery techniques of checkpoint and rollback. When either processor failure or communication error has occurred, the rollback recovery for processors associated with such an event is executed to the most recent checkpoint, and so, a consistent state in the whole system is maintained. The stochastic model with the above recovery techniques is formulated, using the theory of Markov renewal processes. The mean time to take checkpoint and the expected numbers of rollback recovery caused by processor failures and communication errors are derived. Further, an optimal checkpointing interval which minimizes the expected cost is analytically discussed.