A watchdog processor based general rollback technique with multiple retries
IEEE Transactions on Software Engineering
A first order approximation to the optimum checkpoint interval
Communications of the ACM
Fault Tolerance: Principles and Practice
Fault Tolerance: Principles and Practice
Design and Analysis of an Integrated Checkpointing and Recovery Scheme for Distributed Applications
IEEE Transactions on Knowledge and Data Engineering
Error Recovery in Shared Memory Multiprocessors Using Private Caches
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 14.98 |
The effects of using a recovery cache to save the variables of a program are studied. A novel optimization model for rollback is formulated to include the effects of a recovery cache in rollback systems. The parameters of the model proposed are the maximum recovery time, the cache size, and the save and load time associated with the task size. The results are also discussed of an experimental study conducted to estimate the parameters of the programs that are critical for arriving at a suitable task size or cache size to minimize the cost of recovery.