Compiler-assisted full checkpointing
Software—Practice & Experience
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
A Checkpointing Tool for Palm Operating System
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Portable Checkpointing for Heterogeneous Archtitectures
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
RENEW: A Tool for Fast and Efficient Implementation of Checkpoint Protocols
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Message Logging in Mobile Computing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Checkpointing in CosMiC: A User-Level Process Migration Environment
PRFTS '97 Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant Systems
Fault Detection Using Hints from the Socket Layer
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Low-Cost Checkpointing with Mutable Checkpoints in Mobile Computing Systems
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Using Time to Improve the Performance of Coordinated Checkpointing
IPDS '96 Proceedings of the 2nd International Computer Performance and Dependability Symposium (IPDS '96)
High-Level Fault Tolerance in Distributed Programs
High-Level Fault Tolerance in Distributed Programs
M-JavaMPI: A Java-MPI Binding with Process Migration Support
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Hi-index | 14.98 |
Heterogeneous computing environments, where computers may have different instruction set architectures, data representations, and operating systems, complicate checkpointing and recovery of processes. This paper describes an approach to recovery and an implementation, PREACHES, that provides portable checkpointing of single-process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation mechanism creates machine-dependent checkpoints for different architectures in the heterogeneous environment. A process is restored on a specific machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES has been evaluated on a heterogeneous network of workstations, including Sun, HP, and Pentium machines. The experimental results show that PREACHES achieves efficient checkpointing and rapid recovery.