Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Rollback-dependency trackability: a minimal characterization and its protocol
Information and Computation
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Checkpointing and rollback recovery are widely used techniques to handle failures in distributed computing systems. Usually we avoid taking checkpoints that are useless during the recovery process. Communication-Induced checkpointing algorithms guarantee the usefulness of all the checkpoints and provide considerable autonomy with relatively low overhead. In this paper, we propose an enhanced Communication-Induced checkpointing algorithm. Our algorithm is likely to have less checkpointing overhead than an existing algorithm in the literature.