FINE: A Fully Informed aNd Efficient communication-induced checkpointing protocol for distributed systems

Authors:
Yi Luo;D. Manivannan
Affiliations:
Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA;Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 23
Cited 1

Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Logical Time in Distributed Computing Systems

Computer - Distributed computing systems: separate resources acting as one
Consistent global checkpoints based on direct dependency tracking

Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints

IEEE Transactions on Computers
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Evaluations of domino-free communication-induced checkpointing protocols

Information Processing Letters
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification

IEEE Transactions on Parallel and Distributed Systems
Communication-Induced Determination of Consistent Snapshots

IEEE Transactions on Parallel and Distributed Systems
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
On the no-z-cycle property in distributed executions

Journal of Computer and System Sciences
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Virtual Precedence in Asynchronous Systems: Cencept and Applications

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
An Analysis of Communication-Induced Checkpointing

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
A low-overhead recovery technique using quasi-synchronous checkpointing

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Distributed Recovery with K-Optimistic Logging

ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
Progressive Construction of Consistent Global Checkpoints

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Communication-based prevention of useless checkpoints in distributed computations

Distributed Computing
On the Fully-Informed Communication-Induced Checkpointing Protocol

PRDC '05 Proceedings of the 11th Pacific Rim International Symposium on Dependable Computing
An enhanced model-based checkpointing protocol

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
FINE: A Fully Informed aNd Efficient Communication-Induced Checkpointing Protocol

ICONS '08 Proceedings of the Third International Conference on Systems

HOPE: A Hybrid Optimistic checkpointing and selective Pessimistic mEssage logging protocol for large scale distributed systems

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Communication-Induced Checkpointing (CIC) protocols are classified into two categories in the literature: Index-based and Model-based. In this paper, we discuss two data structures being used in these two kinds of CIC protocols, and their different roles in helping the checkpointing algorithms to enforce Z-cycle Free (ZCF) property. Then, we present our Fully Informed aNd Efficient (FINE) communication-induced checkpointing algorithm, which not only has less checkpointing overhead than the well-known Fully Informed (FI) CIC protocol proposed by Helary et al. but also has less message overhead. Performance evaluation indicates that our protocol performs better than many of the other existing CIC protocols.