Efficient large-scale process-oriented parallel simulations
Proceedings of the 30th conference on Winter simulation
ACM SIGOPS Operating Systems Review
Fault-Tolerant File-I/O for Portable Checkpointing Systems
The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
Virtual-machine-based heterogeneous checkpointing
Software—Practice & Experience
Portable and Fault-Tolerant Software Systems
IEEE Micro
Process Recovery in Heterogeneous Systems
IEEE Transactions on Computers
Virtual Machine Based Heterogeneous Checkpointing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Using Compile-Time Reflection for Objects'State Capture
Reflection '99 Proceedings of the Second International Conference on Meta-Level Architectures and Reflection
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
BASE: Using abstraction to improve fault tolerance
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Optimizing Checkpoint Sizes in the C3 System
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mobile MPI programs in computational grids
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Recent advances in checkpoint/recovery systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compiler-support for robust multi-core computing
ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I
BRRL: a recovery library for main-memory applications in the cloud
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A technique for non-invasive application-level checkpointing
The Journal of Supercomputing
A hybrid message Logging-CIC protocol for constrained checkpointability
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Execution migration in a heterogeneous-ISA chip multiprocessor
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Application-Level checkpointing techniques for parallel programs
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
The Journal of Supercomputing
Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
International Journal of Parallel Programming
Hi-index | 0.00 |
Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2ftc compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies.