On the Optimum Checkpoint Interval
Journal of the ACM (JACM)
Performance-Related Reliability Measures for Computing Systems
IEEE Transactions on Computers
Hi-index | 0.00 |
The paper proposes a new concept for providing software fault tolerance in concurrent systems. It combines the traditional global-checkpointing mechanism with the recovery-block concept in order to come up with an easily implementable errorrecovery mechanism. This mechanism involves smaller overhead in case of moderate to high process interaction than the schemes considered in the past, which are based upon the idea of local checkpointing. A model for computing the optimum checkpointing interval is also presented. A particular distribution is hypothesized for the coverage of the recovery and the behavior of the model studied in detail for this case.