Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
On the Optimal Total Processing Time Using Checkpoints
IEEE Transactions on Software Engineering
An Efficient Protocol for Checkpointing Recovery in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A low-overhead recovery technique using quasi-synchronous checkpointing
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Hi-index | 0.00 |
Synchronization issues in checkpointing and rollback recovery schemes have been dealt with in depth over the past few years. The authors investigate the possibility of deadlocks in a fully uncoordinated checkpointing system. A protocol is first illustrated for a fully uncoordinated checkpointing scheme. Rollback propagation analysis (RPA) is performed using a stack based algorithm. The probability of deadlock (due to rollbacks) for a finite buffer size is then computed. The optimal number of buffers required to eliminate the possibility of deadlock is calculated. Finally a comparative analysis is performed between the predicted buffer size and the simulated result. The simulation study shows that the probability of deadlock decreases as the number of buffers increases, till an optimal buffer size is reached where the deadlock probability becomes zero.