Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
On Coordinated Checkpointing in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Checkpointing distributed applications on mobile computers
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Recoverable mobile environment: design and trade-off analysis
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Message Logging in Mobile Computing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
An Efficient Recovery Scheme for Mobile Computing Environments
ICPADS '01 Proceedings of the Eighth International Conference on Parallel and Distributed Systems
Checkpoint-Recovery for Mobile Computing Systems
ICDCSW '01 Proceedings of the 21st International Conference on Distributed Computing Systems
A causal message logging protocol for mobile nodes in mobile computing systems
Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
State Restoration in Systems of Communicating Processes
IEEE Transactions on Software Engineering
Designing distributed algorithms for mobile computing networks
Computer Communications
Hi-index | 0.00 |
The vast computing potential of mobile computing systems is often hampered by their susceptibility to transient and independent failures. To add reliability and high availability to such systems, checkpoint based rollback recovery is one of the widely used ones for scientific computing, database, telecommunication and mission critical applications. This paper presents a coordinated nonblocking checkpointing and recovery technique for such systems that handles the constraints posed by the underlying wireless network, efficiently. Here an initiator (an MSS) sends checkpoint requests to all other MSSs and the MSSs send this request only to those MHs, which have communicated in the last checkpointing interval (relieving the wireless network from synchronization overhead). Also all acknowledged messages are logged at the home station of the receiver MH so that only the faulty MHs need to recover in case of failure and no other process is affected by this fault and subsequent recovery.