Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Paradigms for process interaction in distributed programs
ACM Computing Surveys (CSUR)
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
On the relevance of communication costs of rollback-recovery protocols
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Adaptive recovery for mobile environments
Communications of the ACM
Trade-offs in implementing causal message logging protocols
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Optimistic Crash Recovery without Changing Application Messages
IEEE Transactions on Parallel and Distributed Systems
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Checkpointing distributed applications on mobile computers
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Mobile IP; Design Principles and Practices
Mobile IP; Design Principles and Practices
The Cost of Recovery in Message Logging Protocols
IEEE Transactions on Knowledge and Data Engineering
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
Message Logging in Mobile Computing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A Non-Blocking Recovery Algorithm for Causal Message Logging
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
How to recover efficiently and asynchronously when optimism fails
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
An Asynchronous Recovery Scheme based on Optimistic Message Logging for Mobile Computing Systems
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Completely Asynchronous Optimistic Recovery with Minimal Rollbacks
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A weighted checkpointing protocol for mobile distributed systems
International Journal of Ad Hoc and Ubiquitous Computing
A consistent checkpointing-recovery protocol for minimal number of nodes in mobile computing system
HiPC'07 Proceedings of the 14th international conference on High performance computing
Hi-index | 0.00 |
This paper presents a causal message logging protocol with independent checkpointing for mobile nodes with the aim of efficiently handling several constraints of the mobile nodes such as mobility and disconnection, limited life of battery power, small amount of storage and low bandwidth on wireless link. For this purpose, the protocol includes a low-cost failure-free mechanism requiring only locating the mobility agent maintaining the latest checkpoint of each process on an mobile node during its handoff process. This mechanism forces only the latest checkpoint to be maintained on the stable storage while incurring low failure-free overhead. Also, the protocol uses two garbage collection schemes to remove log information of mobile nodes. The first scheme enables each mobile node to autonomously remove useless log information in its storage by piggybacking only some additional information without requiring any extra message and forced checkpoint. The second scheme allows the mobile node to remove a part of log information in its storage if more empty storage space is required after executing the first scheme. It reduces the number of processes to participate in the garbage collection by using the size of the log information of each process. Simulation results show that the two proposed schemes significantly reduce the garbage collection overhead compared with traditional schemes. Additionally, we present an efficient recovery algorithm to avoid frequent stable storage accesses, impose no restriction on the execution of live processes during recovery and ensure consistent recovery in case of being integrated with independent checkpointing.