Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Efficient algorithms for crash recovery in distributed systems
FST and TC 10 Proceedings of the tenth conference on Foundations of software technology and theoretical computer science
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
A Distributed Algorithm for Minimum-Weight Spanning Trees
ACM Transactions on Programming Languages and Systems (TOPLAS)
Byzantine generals in action: implementing fail-stop processors
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Reliable Computer Systems
A Dynamic Information-Structure Mutual Exclusion Algorithm for Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Distributed Recovery Block Approach to Fault-Tolerant Execution of Application Tasks in Hypercubes
IEEE Transactions on Parallel and Distributed Systems
An Efficient Protocol for Checkpointing Recovery in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Software Engineering
A message system supporting fault tolerance
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Efficient algorithms for optimistic crash recovery
Distributed Computing
A Roll-Forward Recovery Scheme for Solving the Problem of Coasting Forward for Distributed Systems
ACM SIGOPS Operating Systems Review
Efficient Garbage Collection Schemes for Causal Message Logging with Independent Checkpointing
The Journal of Supercomputing
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On designing direct dependency: based fast recovery algorithms for distributed systems
ACM SIGOPS Operating Systems Review
A causal message logging protocol for mobile nodes in mobile computing systems
Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
A novel non-block synchronous checkpointing scheme for distributed systems
ICS'05 Proceedings of the 9th WSEAS International Conference on Systems
Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation
GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
A novel low-overhead recovery approach for distributed systems
Journal of Computer Systems, Networks, and Communications
Message fragment based causal message logging
Journal of Parallel and Distributed Computing
Domino-effect free crash recovery for concurrent failures in cluster federation
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
A low-overhead non-block checkpointing algorithm for mobile computing environment
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Hi-index | 0.00 |
We present an optimistic crash recovery technique without any communication overhead during normal operations of the distributed system. Our technique does not append any information to the application messages, it does not suffer from the domino effect, and each processor rolls back at most once during recovery. We present three distributed rollback algorithms, their complexities, and correctness proofs. Their performances are measured through extensive simulations.