Performance analysis of different checkpointing and recovery schemes using stochastic model

Authors:
Partha Sarathi Mandal;Krishnendu Mukhopadhyaya
Affiliations:
Advanced Computing and Microelectronics Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700108, India;Advanced Computing and Microelectronics Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700108, India
Venue:
Journal of Parallel and Distributed Computing
Year:
2006

Citing 21
Cited 2

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Recovery in distributed systems using optimistic message logging and check-pointing

Journal of Algorithms
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems

IEEE Transactions on Parallel and Distributed Systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints

IEEE Transactions on Computers
A Case for Two-Level Recovery Schemes

IEEE Transactions on Computers
On Coordinated Checkpointing in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification

IEEE Transactions on Parallel and Distributed Systems
Processor allocation and checkpoint interval selection in cluster computing systems

Journal of Parallel and Distributed Computing - Special issue on cluster and network-based computing
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
The Cost of Recovery in Message Logging Protocols

IEEE Transactions on Knowledge and Data Engineering
An Efficient Protocol for Checkpointing Recovery in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Asynchronous recovery without using vector timestamps

Journal of Parallel and Distributed Computing
Performance Evaluation of a Two Level Error Recovery Scheme for Distributed Systems

IWDC '02 Proceedings of the 4th International Workshop on Distributed Computing, Mobile and Wireless Computing
How to recover efficiently and asynchronously when optimism fails

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Parallel Checkpoint/Restart without Message Logging

ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
Causality tracking in causal message-logging protocols

Distributed Computing
Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks

Journal of Parallel and Distributed Computing
Efficient algorithms for optimistic crash recovery

Distributed Computing

A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

Information Sciences: an International Journal
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several schemes for checkpointing and rollback recovery have been reported in the literature. In this paper, we analyze some of these schemes under a stochastic model. We have derived expressions for average cost of checkpointing, rollback recovery, message logging and piggybacking with application messages in synchronous as well as asynchronous checkpointing. For quasi-synchronous checkpointing we show that in a system with n processes, the upper bound and lower bound of selective message logging are O(n^2) and O(n), respectively.