Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Clock synchronization in distributed real-time systems
IEEE Transactions on Computers - Special Issue on Real-Time Systems
Rollback recovery schemes for distributed systems
Rollback recovery schemes for distributed systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Experimental Evaluation of Concurrency Checkpointing and Rollback-Recovery Algorithms
Proceedings of the Sixth International Conference on Data Engineering
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Checkpoint and Rollback in Asynchronous Distributed Systems
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
Finding a Recovery Line in Uncoordinated Checkpointing
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Hi-index | 0.00 |
A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.