Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Fault tolerance in distributed systems
Fault tolerance in distributed systems
Consistent global checkpoints based on direct dependency tracking
Information Processing Letters
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Optimistic Crash Recovery without Changing Application Messages
IEEE Transactions on Parallel and Distributed Systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
An Analytical Model for Hybrid Checkpointing in Time Warp Distributed Simulation
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Software—Practice & Experience
On Coordinated Checkpointing in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems
Reliability Issues in Computing System Design
ACM Computing Surveys (CSUR)
Principles of Distributed Systems
Principles of Distributed Systems
Distributed Operating Systems and Algorithms
Distributed Operating Systems and Algorithms
Advanced Concepts in Operating Systems
Advanced Concepts in Operating Systems
Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture
IEEE Transactions on Computers
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
Communication analysis of distributed programs
Scientific Programming - Parallel/High-Performance Object-Oriented Scientific Computing (POOSC '05), Glasgow, UK, 25 July 2005
Hi-index | 0.00 |
In this paper, a new index-based hybrid checkpointing scheme has been proposed to tackle the problem arising due to "coasting forward". This scheme uses both incremental and communication-induced checkpoints. Failures have been classified as hard and soft in order to take the advantage (i.e. possible reduction of rollback) which the incremental checkpoints offer, while designing the recovery approaches for these two types of failures. The presented theoretical results show that for soft failures, the recovery approach offers maximum roll-forward.