Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Introduction to algorithms
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Real-time, concurrent checkpoint for parallel programs
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Concurrent Robust Checkpointing and Recovery in Distributed Systems
Proceedings of the Fourth International Conference on Data Engineering
A message system supporting fault tolerance
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
How to recover efficiently and asynchronously when optimism fails
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
A Theory of Distributed Time
Completely Asynchronous Optimistic Recovery with Minimal Rollbacks
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Distributed system fault tolerance using message logging and checkpointing
Distributed system fault tolerance using message logging and checkpointing
State Restoration in Systems of Communicating Processes
IEEE Transactions on Software Engineering
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Checkpoint-Recovery for Mobile Intelligent Networks
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Distributed recovery with K-optimistic logging
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Basing rollback recovery on optimistic message logging and replay avoids the need for synchronization between processes during failure-free execution. Some previous research has also attempted to reduce the need for synchronization during recovery, but these protocols have suffered from three problems: not eliminating all synchronization during recovery, not minimizing rollback, or providing these properties but requiring large timestamps. This paper makes two contributions: we present a new rollback recovery protocol, based on our previous work, that provides these properties (asynchronous recovery, minimal rollback) while reducing the timestamp size; and we prove that no protocol can provide these properties and have asymptotically smaller timestamps.