ACM Transactions on Programming Languages and Systems (TOPLAS)
An introduction to database systems: vol. I (4th ed.)
An introduction to database systems: vol. I (4th ed.)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Efficient checkpointing on MIMD architectures
Efficient checkpointing on MIMD architectures
Checkpoint-based forward recovery using lookahead execution and rollback validation in parallel and distributed systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme
IEEE Transactions on Computers
CLIP: a checkpointing tool for message-passing parallel programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Low-Latency, Concurrent Checkpointing for Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
On the Effectiveness of Distributed Checkpoint Algorithms for Domino-Free Recovery
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Maximum and minimum consistent global checkpoints and their applications
SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
The performance of consistent checkpointing in distributed shared memory systems
SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
Preventing Useless Checkpoints in Distributed Computations
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Distributed system fault tolerance using message logging and checkpointing
Distributed system fault tolerance using message logging and checkpointing
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Checkpointing with mutable checkpoints
Theoretical Computer Science - Dependable computing
Distributed Checkpointing on Clusters with Dynamic Striping and Staggering
ASIAN '02 Proceedings of the7th Asian Computing Science Conference on Advances in Computing Science: Internet Computing and Modeling, Grid Computing, Peer-to-Peer Computing, and Cluster
An Efficient Coordinated Checkpointing Scheme Based on PWD Model
ICOIN '02 Revised Papers from the International Conference on Information Networking, Wireless Communications Technologies and Network Applications-Part II
Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks
Journal of Parallel and Distributed Computing
A new approach to real-time checkpointing
Proceedings of the 2nd international conference on Virtual execution environments
Self-stabilizing algorithm for checkpointing in a distributed system
Journal of Parallel and Distributed Computing
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage
Information Sciences: an International Journal
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage
Information Sciences: an International Journal
Fault-tolerant stream processing using a distributed, replicated file system
Proceedings of the VLDB Endowment
Journal of Parallel and Distributed Computing
Numerical computation algorithms for sequential checkpoint placement
Performance Evaluation
A Checkpointing Method with Small Checkpoint Latency
IEICE - Transactions on Information and Systems
Interconnect agnostic checkpoint/restart in open MPI
Proceedings of the 18th ACM international symposium on High performance distributed computing
ICS'08 Proceedings of the 12th WSEAS international conference on Systems
A weighted checkpointing protocol for mobile distributed systems
International Journal of Ad Hoc and Ubiquitous Computing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
An efficient non-intrusive checkpointing algorithm for distributed database systems
ICDCN'06 Proceedings of the 8th international conference on Distributed Computing and Networking
An asynchronous recovery algorithm based on a staggered quasi-synchronous checkpointing algorithm
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
Self-stabilizing checkpointing algorithm in ring topology
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
A fault-tolerant multi-agent development framework
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Future Generation Computer Systems
Hi-index | 0.00 |
A consistent checkpointing algorithm saves a consistent view of a distributed application's state on stable storage. The traditional consistent checkpointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce checkpoint overhead. This paper presents a simple approach to arbitrarily stagger the checkpoints. Our approach requires that the processes take consistent logical checkpoints, as compared to consistent physical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented.