On the optimum checkpoint selection problem
SIAM Journal on Computing
Optimal checkpointing of real-time tasks
IEEE Transactions on Computers
Optimal policy for batch operations: backup, checkpointing, reorganization, and updating
ACM Transactions on Database Systems (TODS)
On the Optimum Checkpoint Interval
Journal of the ACM (JACM)
Performance analysis of checkpointing strategies
ACM Transactions on Computer Systems (TOCS)
A first order approximation to the optimum checkpoint interval
Communications of the ACM
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Low-Latency, Concurrent Checkpointing for Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
On Checkpoint Latency
Another Two-Level Failure Recovery Scheme
Another Two-Level Failure Recovery Scheme
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
IEEE Transactions on Parallel and Distributed Systems
Staggered Consistent Checkpointing
IEEE Transactions on Parallel and Distributed Systems
Analysis of Checkpointing for Real-Time Systems
Real-Time Systems
Fault Tolerant Wide-Area Parallel Computing
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
User-Level Checkpointing for LinuxThreads Programs
Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Adaptive incremental checkpointing for massively parallel systems
Proceedings of the 18th annual international conference on Supercomputing
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
IEEE Transactions on Dependable and Secure Computing
Models and Modeling Infrastructures for Global Computational Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A higher order estimate of the optimum checkpoint interval for restart dumps
Future Generation Computer Systems
A new approach to real-time checkpointing
Proceedings of the 2nd international conference on Virtual execution environments
Distribution-Free Checkpoint Placement Algorithms Based on Min-Max Principle
IEEE Transactions on Dependable and Secure Computing
A Parsimonious Approach for Obtaining Resource-Efficient and Trustworthy Execution
IEEE Transactions on Dependable and Secure Computing
Failure-aware checkpointing in fine-grained cycle sharing systems
Proceedings of the 16th international symposium on High performance distributed computing
Using queue structures to improve job reliability
Proceedings of the 16th international symposium on High performance distributed computing
Modeling and design of fault-tolerant and self-adaptive reconfigurable networked embedded systems
EURASIP Journal on Embedded Systems
Model-based performance evaluation of distributed checkpointing protocols
Performance Evaluation
Experimental Assessment of the Practicality of a Fault-Tolerant System
SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Providing Fault-Tolerance in Unreliable Grid Systems Through Adaptive Checkpointing and Replication
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
Optimization of checkpointing-related I/O for high-performance parallel and distributed computing
The Journal of Supercomputing
Numerical computation algorithms for sequential checkpoint placement
Performance Evaluation
A Checkpointing Method with Small Checkpoint Latency
IEICE - Transactions on Information and Systems
Modeling and Analysis of Checkpoint I/O Operations
ASMTA '09 Proceedings of the 16th International Conference on Analytical and Stochastic Modeling Techniques and Applications
Future Generation Computer Systems
A higher order estimate of the optimum checkpoint interval for restart dumps
Future Generation Computer Systems
A model for predicting the optimum checkpoint interval for restart dumps
ICCS'03 Proceedings of the 2003 international conference on Computational science
Analysis of a software system with rejuvenation, restoration and checkpointing
ISAS'08 Proceedings of the 5th international conference on Service availability
Journal of Systems and Software
Checkpoint scheduling model for optimality
Information Processing Letters
Comparing checkpoint and rollback recovery schemes in a cluster system
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
On the checkpointing strategy in desktop grids
IDCS'12 Proceedings of the 5th international conference on Internet and Distributed Computing Systems
Hi-index | 14.98 |
Checkpointing reduces loss of computation in the presence of failures. Two metrics characterize a checkpointing scheme: checkpoint overhead and checkpoint latency. This paper shows that a large increase in latency is acceptable if it is accompanied by a relatively small reduction in overhead. Also, for equidistant checkpoints, optimal checkpoint interval is shown to be typically independent of checkpoint latency.