Object and native code thread mobility among heterogeneous computers (includes sources)
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Mobile agents with Java: The Aglet API
World Wide Web
Efficient Incremental Checkpointing of Java Programs
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Design, Implementation, and Performance of Checkpointing in NetSolve
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Portable Checkpointing for Heterogeneous Archtitectures
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
An Overview of Checkpointing in Uniprocessor and DistributedSystems, Focusing on Implementation and Performance
Process Introspection: A Heterogeneous Checkpoint/Restart Mechanism Based on Automatic Code Modification
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Making Java applications mobile or persistent
COOTS'01 Proceedings of the 6th conference on USENIX Conference on Object-Oriented Technologies and Systems - Volume 6
Hi-index | 0.00 |
Checkpointing an application is the act of saving the application's state during its execution on stable storage so that if the application fails, it can be restarted from the last saved state, thereby avoiding loss of the work that was already done. A heterogeneous checkpoint/restart mechanism allows to restart an application from a saved state that was taken in a hardware architecture and/or operating system that can be different from those in the machine on which it is restarted. This paper explores how to construct such a mechanism at the virtual machine level. That is, rather than dumping the entire state of the application process, the mechanism reported here dumps the state of the application w.r.t. a virtual machine. During restart, the saved state is loaded into a new copy of the virtual machine, which continues running from there. The heterogeneous checkpoint/restart mechanism reported here was developed for the OCaml variant of ML. The paper reports on the main issues encountered in building such a mechanism and the design choices made, presents performance evaluations, and discusses some lessons and ideas for extending the work to native code OCaml, and to Java Virtual Machines.