Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Logical Time in Distributed Computing Systems
Computer - Distributed computing systems: separate resources acting as one
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Internetworking with TCP/IP vol. III: client-server programming and applications
Internetworking with TCP/IP vol. III: client-server programming and applications
Consistent global checkpoints based on direct dependency tracking
Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Rollback-dependency trackability: visible characterizations
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Evaluations of domino-free communication-induced checkpointing protocols
Information Processing Letters
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
On the no-z-cycle property in distributed executions
Journal of Computer and System Sciences
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Virtual Precedence in Asynchronous Systems: Cencept and Applications
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
An Analysis of Communication-Induced Checkpointing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A low-overhead recovery technique using quasi-synchronous checkpointing
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Overlay Networks: A Scalable Alternative for P2P
IEEE Internet Computing
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
State Restoration in Systems of Communicating Processes
IEEE Transactions on Software Engineering
An Index-Based Mobile Checkpointing and Recovery Algorithm
ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
ICS'08 Proceedings of the 12th WSEAS international conference on Systems
Hi-index | 0.00 |
Communication-induced checkpointing (CIC) protocols can be used to prevent the domino effect. Such protocols that belong to the index-based category were shown to have a better performance. In this paper, we propose an efficient index-based CIC protocol. The fully informed (FI) protocol proposed in the literature has been known to be the best index-based CIC protocol that one can achieve since the optimal protocol needs to acquire the future information. We discover that the enhancement adopted by such a protocol rarely takes effect in practice. By discarding this enhancement, we obtain a new protocol, called NMMP. Simulation results show that our protocol is almost as efficient as FI in some typical computational environments. Especially, we demonstrate that the two protocols have the same behavior over a tree communication network. Surprisingly, NMMP only has to piggyback on each message control information of constant size, regardless of the number of processes.