An Efficient Index-Based Checkpointing Protocol with Constant-Size Control Information on Messages

Authors:
Jichiang Tsai
Affiliations:
-
Venue:
IEEE Transactions on Dependable and Secure Computing
Year:
2005

Citing 20
Cited 2

Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Logical Time in Distributed Computing Systems

Computer - Distributed computing systems: separate resources acting as one
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Internetworking with TCP/IP vol. III: client-server programming and applications

Internetworking with TCP/IP vol. III: client-server programming and applications
Consistent global checkpoints based on direct dependency tracking

Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints

IEEE Transactions on Computers
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Rollback-dependency trackability: visible characterizations

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Evaluations of domino-free communication-induced checkpointing protocols

Information Processing Letters
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
On the no-z-cycle property in distributed executions

Journal of Computer and System Sciences
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Virtual Precedence in Asynchronous Systems: Cencept and Applications

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
An Analysis of Communication-Induced Checkpointing

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A low-overhead recovery technique using quasi-synchronous checkpointing

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Overlay Networks: A Scalable Alternative for P2P

IEEE Internet Computing
Communication-based prevention of useless checkpoints in distributed computations

Distributed Computing
State Restoration in Systems of Communicating Processes

IEEE Transactions on Software Engineering

An Index-Based Mobile Checkpointing and Recovery Algorithm

ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
Checkpointing and rollback recovery in distributed systems: existing solutions, open issues and proposed solutions

ICS'08 Proceedings of the 12th WSEAS international conference on Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Communication-induced checkpointing (CIC) protocols can be used to prevent the domino effect. Such protocols that belong to the index-based category were shown to have a better performance. In this paper, we propose an efficient index-based CIC protocol. The fully informed (FI) protocol proposed in the literature has been known to be the best index-based CIC protocol that one can achieve since the optimal protocol needs to acquire the future information. We discover that the enhancement adopted by such a protocol rarely takes effect in practice. By discarding this enhancement, we obtain a new protocol, called NMMP. Simulation results show that our protocol is almost as efficient as FI in some typical computational environments. Especially, we demonstrate that the two protocols have the same behavior over a tree communication network. Surprisingly, NMMP only has to piggyback on each message control information of constant size, regardless of the number of processes.