A Roll-Forward Recovery Scheme for Solving the Problem of Coasting Forward for Distributed Systems

Authors:
B. Gupta;S. K. Banerjee
Affiliations:
Southern Illinois University, Carbondale, IL;University of Calcutta, Calcutta, India
Venue:
ACM SIGOPS Operating Systems Review
Year:
2001

Citing 21
Cited 1

Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Recovery in distributed systems using optimistic message logging and check-pointing

Journal of Algorithms
Fault tolerance in distributed systems

Fault tolerance in distributed systems
Consistent global checkpoints based on direct dependency tracking

Information Processing Letters
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Optimistic Crash Recovery without Changing Application Messages

IEEE Transactions on Parallel and Distributed Systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints

IEEE Transactions on Computers
An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation

IEEE Transactions on Parallel and Distributed Systems
Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability

IEEE Transactions on Parallel and Distributed Systems
The STAR fault manager for distributed operating environments. Design, implementation and performance

Software—Practice & Experience
On Coordinated Checkpointing in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification

IEEE Transactions on Parallel and Distributed Systems
Reliability Issues in Computing System Design

ACM Computing Surveys (CSUR)
Principles of Distributed Systems

Principles of Distributed Systems
Distributed Operating Systems and Algorithms

Distributed Operating Systems and Algorithms
Advanced Concepts in Operating Systems

Advanced Concepts in Operating Systems
Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture

IEEE Transactions on Computers
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Communication-based prevention of useless checkpoints in distributed computations

Distributed Computing

Communication analysis of distributed programs

Scientific Programming - Parallel/High-Performance Object-Oriented Scientific Computing (POOSC '05), Glasgow, UK, 25 July 2005

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a new index-based hybrid checkpointing scheme has been proposed to tackle the problem arising due to "coasting forward". This scheme uses both incremental and communication-induced checkpoints. Failures have been classified as hard and soft in order to take the advantage (i.e. possible reduction of rollback) which the incremental checkpoints offer, while designing the recovery approaches for these two types of failures. The presented theoretical results show that for soft failures, the recovery approach offers maximum roll-forward.