An abstract model of rollback recovery control in distributed systems

Authors:
Jiannong Cao;K. C. Wang
Affiliations:
-;-
Venue:
ACM SIGOPS Operating Systems Review
Year:
1992

Citing 12
Cited 1

Distributed databases principles and systems

Distributed databases principles and systems
Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
On distributed snapshots

Information Processing Letters
Recovery in distributed systems using optimistic message logging and check-pointing

Journal of Algorithms
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Guardians and Actions: Linguistic Support for Robust, Distributed Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Concurrent Robust Checkpointing and Recovery in Distributed Systems

Proceedings of the Fourth International Conference on Data Engineering
Atomic Transactions

Distributed Systems - Architecture and Implementation, An Advanced Course
A message system supporting fault tolerance

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Publishing: a reliable broadcast communication mechanism

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Atomic actions in concurrent systems (fault-tolerance, control)

Atomic actions in concurrent systems (fault-tolerance, control)

Checkpointing and Rollback of Wide-area Distributed Applications using Mobile Agents

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops an abstract model which presents a method of uniform description of different rollback recovery control algorithms for distributed systems. We first developed a general definition of the distributed rollback recovery control problem. The concept of a distributed recovery control system (DRC system), consisting of distributed recovery control units (DRC units), is proposed to model recovery with various control granularities. Then, we developed a graph model, called dependency graph, for distributed rollback recovery control algorithms. An atomic subgraph is defined as a subgraph induced by a set of nodes which has no outgoing arcs to other nodes in the graph. Committing and aborting atomic actions can be modeled as identifying atomic subgraphs. Next, we defined two kinds of dependency graphs: checkpoint graphs and unit graphs, based on the dependency relation defined by rollback propagation. We have shown that various types of distributed recovery control algorithms can be classified based on the identifications of atomic subgraphs in these two graphs. Therefore, using the model may allow us to describe existing algorithms in a uniform way and, more importantly, to find new algorithms.