Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Logical Time in Distributed Computing Systems
Computer - Distributed computing systems: separate resources acting as one
Consistent global checkpoints based on direct dependency tracking
Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Evaluations of domino-free communication-induced checkpointing protocols
Information Processing Letters
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems
Communication-Induced Determination of Consistent Snapshots
IEEE Transactions on Parallel and Distributed Systems
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
On the no-z-cycle property in distributed executions
Journal of Computer and System Sciences
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Virtual Precedence in Asynchronous Systems: Cencept and Applications
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
An Analysis of Communication-Induced Checkpointing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
A low-overhead recovery technique using quasi-synchronous checkpointing
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Distributed Recovery with K-Optimistic Logging
ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
Progressive Construction of Consistent Global Checkpoints
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
On the Fully-Informed Communication-Induced Checkpointing Protocol
PRDC '05 Proceedings of the 11th Pacific Rim International Symposium on Dependable Computing
An enhanced model-based checkpointing protocol
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
FINE: A Fully Informed aNd Efficient Communication-Induced Checkpointing Protocol
ICONS '08 Proceedings of the Third International Conference on Systems
Future Generation Computer Systems
Hi-index | 0.00 |
Communication-Induced Checkpointing (CIC) protocols are classified into two categories in the literature: Index-based and Model-based. In this paper, we discuss two data structures being used in these two kinds of CIC protocols, and their different roles in helping the checkpointing algorithms to enforce Z-cycle Free (ZCF) property. Then, we present our Fully Informed aNd Efficient (FINE) communication-induced checkpointing algorithm, which not only has less checkpointing overhead than the well-known Fully Informed (FI) CIC protocol proposed by Helary et al. but also has less message overhead. Performance evaluation indicates that our protocol performs better than many of the other existing CIC protocols.