Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems

Authors:
Guohong Cao;Mukesh Singhal
Affiliations:
Pennsylvania State Univ., University Park;Ohio State Univ., Columbus
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 24
Cited 24

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
On distributed snapshots

Information Processing Letters
A network architecture providing host migration transparency

SIGCOMM '91 Proceedings of the conference on Communications architecture & protocols
Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System

IEEE Transactions on Software Engineering
Checkpointing and rollback-recovery algorithms in distributed systems

Journal of Systems and Software - Special issue on fault tolerance in real-time systems
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems

IEEE Transactions on Parallel and Distributed Systems
On Coordinated Checkpointing in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Staggered Consistent Checkpointing

IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Checkpointing distributed applications on mobile computers

PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
The Challenges of Mobile Computing

Computer
An Efficient Protocol for Checkpointing Recovery in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Concurrent Robust Checkpointing and Recovery in Distributed Systems

Proceedings of the Fourth International Conference on Data Engineering
Experimental Evaluation of Concurrency Checkpointing and Rollback-Recovery Algorithms

Proceedings of the Sixth International Conference on Data Engineering
On the Impossibility of Min-Process Non-Blocking Checkpointing and An Efficient Checkpointing Algorithm for Mobile Computing Systems

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Notes on Data Base Operating Systems

Operating Systems, An Advanced Course
A low-overhead recovery technique using quasi-synchronous checkpointing

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Low-Cost Checkpointing with Mutable Checkpoints in Mobile Computing Systems

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Distributed system fault tolerance using message logging and checkpointing

Distributed system fault tolerance using message logging and checkpointing
Mobile IP

IEEE Communications Magazine
IEEE 802.11 Wireless Local Area Networks

IEEE Communications Magazine

On improving the performance of cache invalidation in mobile environments

Mobile Networks and Applications
Correction to "Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems"

IEEE Transactions on Parallel and Distributed Systems
An Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Wide Area Networks (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Error detection in large-scale parallel programs with long runtimes

Future Generation Computer Systems - Tools for program development and analysis
Fault management in mobile computing

Ubiquity
An efficient time-based checkpointing protocol for mobile computing systems over mobile IP

Mobile Networks and Applications - Mobile networking through IP
A causal message logging protocol for mobile nodes in mobile computing systems

Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
Collaborative backup for dependable mobile applications

MPAC '04 Proceedings of the 2nd workshop on Middleware for pervasive and ad-hoc computing
A New Approach for High Performance Computing Systems with Various Checkpointing Schemes

The Journal of Supercomputing
Design and analysis of a fault tolerant hybrid mobile scheme

Information Sciences: an International Journal
A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach

International Journal of Information and Computer Security
A novel non-block synchronous checkpointing scheme for distributed systems

ICS'05 Proceedings of the 9th WSEAS International Conference on Systems
A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems

Mobile Information Systems
Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation

GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
A novel low-overhead recovery approach for distributed systems

Journal of Computer Systems, Networks, and Communications
Context-aware fault tolerance in migratory services

Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services
A weighted checkpointing protocol for mobile distributed systems

International Journal of Ad Hoc and Ubiquitous Computing
A consistent checkpointing-recovery protocol for minimal number of nodes in mobile computing system

HiPC'07 Proceedings of the 14th international conference on High performance computing
New & efficient low overheads algorithm for mobile distributed systems

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
New & efficient low overheads algorithm for mobile distributed systems

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
A proxy based efficient checkpointing scheme for fault recovery in mobile grid system

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Performance evaluation of parallel systems employing roll-forward checkpoint schemes

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
A low-overhead non-block checkpointing algorithm for mobile computing environment

GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems

International Journal of Distributed Systems and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mobile computing raises many new issues such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. Coordinated checkpointing is an attractive approach for transparently adding fault tolerance to distributed applications since it avoids domino effects and minimizes the stable storage requirement. However, it suffers from high overhead associated with the checkpointing process in mobile computing systems. Two approaches have been used to reduce the overhead: First is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process nonblocking. These two approaches were orthogonal previously until the Prakash-Singhal algorithm [28] combined them. However, we [8] found that this algorithm may result in an inconsistency in some situations and we proved that there does not exist a nonblocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we introduce the concept of 驴mutable checkpoint,驴 which is neither a tentative checkpoint nor a permanent checkpoint, to design efficient checkpointing algorithms for mobile computing systems. Mutable checkpoints can be saved anywhere, e.g., the main memory or local disk of MHs. In this way, taking a mutable checkpoint avoids the overhead of transferring large amounts of data to the stable storage at MSSs over the wireless network. We present techniques to minimize the number of mutable checkpoints. Simulation results show that the overhead of taking mutable checkpoints is negligible. Based on mutable checkpoints, our nonblocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.