Using Time to Improve the Performance of Coordinated Checkpointing

Authors:
Nuno Neves;W. Kent Fuchs
Affiliations:
-;-
Venue:
IPDS '96 Proceedings of the 2nd International Computer Performance and Dependability Symposium (IPDS '96)
Year:
1996

Citing 0
Cited 5

Adaptive recovery for mobile environments

Communications of the ACM
Process Recovery in Heterogeneous Systems

IEEE Transactions on Computers
Synergistic Coordination between Software and Hardware Fault Tolerance Techniques

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
An efficient time-based checkpointing protocol for mobile computing systems over mobile IP

Mobile Networks and Applications - Mobile networking through IP

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper describes and evaluates a coordinated checkpoint protocol that uses time to eliminate several performance overheads that are present in traditional protocols. The time-based protocol does not have to exchange coordination messages, does not need to add information to the processes' messages, and only accesses stable storage when checkpoints are saved. This protocol uses a simple initialization procedure to set checkpoint timers at the different processes. After the initialization, each process saves its state independently from the other processes. By disallowing processes from sending messages during an interval before the checkpoint time, the protocol prevents in-transit messages from occurring. Two coordinated checkpoint protocols were implemented on a CM5, and their performance was compared using several applications. Results showed that the time-based protocol outperforms the two-phase protocol in all applications.