Efficient algorithms for optimistic crash recovery

Authors:
S. Venkatesan;Tony T-Y. Juang
Affiliations:
Computer Science Program, University of Texas at Dallas, Richardson, TX;Computer Science Program, University of Texas at Dallas, Richardson, TX
Venue:
Distributed Computing
Year:
1994

Citing 14
Cited 7

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Efficient distributed recovery using message logging

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Recovery in distributed systems using optimistic message logging and check-pointing

Journal of Algorithms
On finding and updating shortest paths distributively

Journal of Algorithms
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
A Distributed Algorithm for Minimum-Weight Spanning Trees

ACM Transactions on Programming Languages and Systems (TOPLAS)
Byzantine generals in action: implementing fail-stop processors

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation

IEEE Transactions on Software Engineering
A message system supporting fault tolerance

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Distributed system fault tolerance using message logging and checkpointing

Distributed system fault tolerance using message logging and checkpointing

Trade-offs in implementing causal message logging protocols

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Optimistic Crash Recovery without Changing Application Messages

IEEE Transactions on Parallel and Distributed Systems
Message Logging: Pessimistic, Optimistic, Causal, and Optimal

IEEE Transactions on Software Engineering
Causality tracking in causal message-logging protocols

Distributed Computing
Performance analysis of different checkpointing and recovery schemes using stochastic model

Journal of Parallel and Distributed Computing
Detecting Arbitrary Stable Properties Using Efficient Snapshots

IEEE Transactions on Software Engineering
Quantitative causality

Neural, Parallel & Scientific Computations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recovery from transient processor failures can be achieved by using optimistic message logging and checkpointing. The faulty processors roll back, and some/all of the non-faulty processors also may have to roll back. This paper formulates the rollback problem as a closure problem. A centralized closure algorithm is presented together with two efficient distributed implementations. Several related problems are also considered and distributed algorithms are presented for solving them.