Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Information Processing Letters
A network architecture providing host migration transparency
SIGCOMM '91 Proceedings of the conference on Communications architecture & protocols
Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System
IEEE Transactions on Software Engineering
Checkpointing and rollback-recovery algorithms in distributed systems
Journal of Systems and Software - Special issue on fault tolerance in real-time systems
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
On Coordinated Checkpointing in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Staggered Consistent Checkpointing
IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Checkpointing distributed applications on mobile computers
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
The Challenges of Mobile Computing
Computer
An Efficient Protocol for Checkpointing Recovery in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Concurrent Robust Checkpointing and Recovery in Distributed Systems
Proceedings of the Fourth International Conference on Data Engineering
Experimental Evaluation of Concurrency Checkpointing and Rollback-Recovery Algorithms
Proceedings of the Sixth International Conference on Data Engineering
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Notes on Data Base Operating Systems
Operating Systems, An Advanced Course
A low-overhead recovery technique using quasi-synchronous checkpointing
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Low-Cost Checkpointing with Mutable Checkpoints in Mobile Computing Systems
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Distributed system fault tolerance using message logging and checkpointing
Distributed system fault tolerance using message logging and checkpointing
IEEE Communications Magazine
IEEE 802.11 Wireless Local Area Networks
IEEE Communications Magazine
On improving the performance of cache invalidation in mobile environments
Mobile Networks and Applications
Correction to "Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems"
IEEE Transactions on Parallel and Distributed Systems
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Error detection in large-scale parallel programs with long runtimes
Future Generation Computer Systems - Tools for program development and analysis
Fault management in mobile computing
Ubiquity
An efficient time-based checkpointing protocol for mobile computing systems over mobile IP
Mobile Networks and Applications - Mobile networking through IP
A causal message logging protocol for mobile nodes in mobile computing systems
Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
Collaborative backup for dependable mobile applications
MPAC '04 Proceedings of the 2nd workshop on Middleware for pervasive and ad-hoc computing
A New Approach for High Performance Computing Systems with Various Checkpointing Schemes
The Journal of Supercomputing
Design and analysis of a fault tolerant hybrid mobile scheme
Information Sciences: an International Journal
A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach
International Journal of Information and Computer Security
A novel non-block synchronous checkpointing scheme for distributed systems
ICS'05 Proceedings of the 9th WSEAS International Conference on Systems
A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems
Mobile Information Systems
Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation
GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
A novel low-overhead recovery approach for distributed systems
Journal of Computer Systems, Networks, and Communications
Context-aware fault tolerance in migratory services
Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services
A weighted checkpointing protocol for mobile distributed systems
International Journal of Ad Hoc and Ubiquitous Computing
A consistent checkpointing-recovery protocol for minimal number of nodes in mobile computing system
HiPC'07 Proceedings of the 14th international conference on High performance computing
New & efficient low overheads algorithm for mobile distributed systems
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
New & efficient low overheads algorithm for mobile distributed systems
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
A proxy based efficient checkpointing scheme for fault recovery in mobile grid system
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Performance evaluation of parallel systems employing roll-forward checkpoint schemes
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
A low-overhead non-block checkpointing algorithm for mobile computing environment
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems
International Journal of Distributed Systems and Technologies
Hi-index | 0.00 |
Mobile computing raises many new issues such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. Coordinated checkpointing is an attractive approach for transparently adding fault tolerance to distributed applications since it avoids domino effects and minimizes the stable storage requirement. However, it suffers from high overhead associated with the checkpointing process in mobile computing systems. Two approaches have been used to reduce the overhead: First is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process nonblocking. These two approaches were orthogonal previously until the Prakash-Singhal algorithm [28] combined them. However, we [8] found that this algorithm may result in an inconsistency in some situations and we proved that there does not exist a nonblocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we introduce the concept of 驴mutable checkpoint,驴 which is neither a tentative checkpoint nor a permanent checkpoint, to design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved anywhere, e.g., the main memory or local disk of MHs. In this way, taking a mutable checkpoint avoids the overhead of transferring large amounts of data to the stable storage at MSSs over the wireless network. We present techniques to minimize the number of mutable checkpoints. Simulation results show that the overhead of taking mutable checkpoints is negligible. Based on mutable checkpoints, our nonblocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.