PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
A worldwide flock of Condors: load sharing among workstation clusters
Future Generation Computer Systems - Special issue: resource management in distributed systems
Preemptable remote execution facilities for the V-system
Proceedings of the tenth ACM symposium on Operating systems principles
The implementation of dynamite: an environment for migrating PVM tasks
ACM SIGOPS Operating Systems Review
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
ACM SIGOPS Operating Systems Review
Autonomous learning of load and traffic patterns to improve cluster utilization
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
SPARC: a security and privacy aware virtual machinecheckpointing mechanism
Proceedings of the 10th annual ACM workshop on Privacy in the electronic society
Application-Level checkpointing techniques for parallel programs
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
Hi-index | 0.00 |
In simple words, process checkpointing means saving the state of a process, so that, it can be reconstructed in the future. Checkpointing followed by restore is important for the purpose of load balancing and fault tolerance. For load balancing, processes may have to be migrated among workstations. Before migrating, a process has to be checkpointed, so that, it can be restored from where it left off. For fault tolerance, a process must be ready for a restore at a different site. Thus, an earlier checkpoint must be ready for the restore. In both cases the process needs to be restarted from its latest checkpoint, thus work done preceding the checkpoint is not wasted. This paper discusses simple techniques of implementing a user-level checkpoint and restore operations for Unix processes. The technique does not require any changes in the user programs or the operating system. The details given show the simplicity of the implementation.