Comparative Analysis of Different Models of Checkpointing and Recovery
IEEE Transactions on Software Engineering
A Case for Two-Level Recovery Schemes
IEEE Transactions on Computers
Performance analysis of checkpointing strategies
ACM Transactions on Computer Systems (TOCS)
A first order approximation to the optimum checkpoint interval
Communications of the ACM
A model of roll-back recovery with multiple checkpoints
ICSE '76 Proceedings of the 2nd international conference on Software engineering
Performance analysis of different checkpointing and recovery schemes using stochastic model
Journal of Parallel and Distributed Computing
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Rollback recovery schemes are used in fault-tolerant distributed systems to minimize the computation loss incurred in the presence of failures. One-level recovery schemes do not consider the different types of failures and their relative frequency of occurrence, thereby tolerating all failures with the same overhead. Two-level recovery schemes aim to provide low overhead protection against more probable failures, providing protection against other failures with possibly higher overhead. In this paper, we have analyzed a two-level recovery scheme due to Vaidya taking probability of task completion on a system with limited repairs as the performance metric.