Minimizing completion time of a program by checkpointing and rejuvenation

  • Authors:
  • Sachin Garg;Yennun Huang;Chandra Kintala;Kishor S. Trivedi

  • Affiliations:
  • Center for Adv. Comp. and Comm., Department of Elec. & Comp. Engg., Duke University, Durham, NC;AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ;AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ;Center for Adv. Comp. and Comm., Department of Elec. & Comp. Engg., Duke University, Durham, NC

  • Venue:
  • Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Checkpointing with rollback-recovery is a well known technique to reduce the completion time of a program in the presence of failures. While checkpointing is corrective in nature, rejuvenation refers to preventive maintenance of software aimed to reduce unexpected failures mostly resulting from the "aging" phenomenon. In this paper, we show how both these techniques may be used together to further reduce the expected completion time of a program. The idea of using checkpoints to reduce the amount of rollback upon a failure is taken a step further by combining it with rejuvenation. We derive the equations for expected completion time of a program with finite failure free running time for the following three cases when; (a) neither checkpointing nor rejuvenation is employed, (b) only checkpointing is employed, and finally (c) both checkpointing and rejuvenation are employed.We also present numerical results for Weibull failure time distribution for the above three cases and discuss optimal checkpointing and rejuvenation that minimizes the expected completion time. Using the numerical results, some interesting conclusions are drawn about benefits of these techniques in relation to the nature of failure distribution.