Efficient and flexible fault tolerance and migration of scientific simulations using CUMULVS
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Virtual-machine-based heterogeneous checkpointing
Software—Practice & Experience
Virtual Machine Based Heterogeneous Checkpointing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Communication State Transfer for the Mobility of Concurrent Heterogeneous Computing
IEEE Transactions on Computers
Low Cost Task Migration Initiation in a Heterogeneous MP-SoC
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Fine-grained hardware/software methodology for process migration in MPSoCs
Proceedings of the International Conference on Computer-Aided Design
Using task migration to improve non-contiguous processor allocation in NoC-based CMPs
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Process Introspection is a fundamentally new solution to the process checkpoint/restart problem suitable for use in high-performance heterogeneous distributed systems. A process checkpoint/restart mechanism for such an environment has the primary requirement that it must be platform-independent: process checkpoints produced on a computer system of one architecture or operating system platform must be restartable on a computer system of a different architecture or operating system platform. The central feature of the Process Introspection approach is automatic augmentation of program code to incorporate checkpoint and restart functionality. This program modification is performed at a platform-independent intermediate level of code representation, and preserves the original program semantics. This approach has attractive properties including portability, ease of use, customizability to application-specific requirements, and flexibility with respect to basic performance trade-offs. Our solution is novel in its true platform- and run-time system independence - no system support or non-portable code is required by our core mechanisms. Recent experimental results obtained using a prototype implementation of the Process Introspection system indicate the overheads introduced by the mechanisms are acceptable for computationally demanding applications.