An asynchronous recovery algorithm based on a staggered quasi-synchronous checkpointing algorithm

Authors:
D. Manivannan;Q. Jiang;J. Yang;K. E. Persson;M. Singhal
Affiliations:
Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY
Venue:
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
Year:
2005

Citing 14
Cited 2

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Recovery in distributed systems using optimistic message logging and check-pointing

Journal of Algorithms
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Efficient checkpointing on MIMD architectures

Efficient checkpointing on MIMD architectures
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Staggered Consistent Checkpointing

IEEE Transactions on Parallel and Distributed Systems
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification

IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Asynchronous recovery without using vector timestamps

Journal of Parallel and Distributed Computing
Observing Global States of Asynchronous Distributed Applications

Proceedings of the 3rd International Workshop on Distributed Algorithms
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
A low-overhead recovery technique using quasi-synchronous checkpointing

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)

A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

Information Sciences: an International Journal
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Checkpointing and rollback recovery are established techniques for handling failures in distributed systems. Under synchronous checkpointing, each process involved in the distributed computation takes checkpoint almost simultaneously. This causes contention for network stable storage and hence degrades performance. To overcome this problem, checkpoint staggering under which checkpoints by various processes are taken in a staggered manner, has been proposed. In this paper, we propose a staggered quasi-synchronous checkpointing algorithm which reduces contention for network stable storage without any synchronization overhead. We also present an asynchronous recovery algorithm based on the checkpointing algorithm.