A serialization based approach for strong mobility of shared object
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Proactive process-level live migration in HPC environments
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Dynamic load balancing for I/O-intensive applications on clusters
ACM Transactions on Storage (TOS)
Future Generation Computer Systems
A serialisation based approach for processes strong mobility
DAIS'07 Proceedings of the 7th IFIP WG 6.1 international conference on Distributed applications and interoperable systems
A scalable asynchronous replication-based strategy for fault tolerant MPI applications
HiPC'07 Proceedings of the 14th international conference on High performance computing
Proactive process-level live migration and back migration in HPC environments
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
A lot of research has been done on faulttolerance for MPI applications, some on checkpoint/restart, and some on network faulttolerance. Process migration, however, has not gained widespread use due to the additional complexity of the requirement that the knowledge about the new location of a migrated process has to be made known to every other process in the application. Here we present a simple yet effective method of process migration based on coordinated checkpointing of MPI applications. Migration is achieved by checkpointing the application, modifying the process location information in the checkpoint files, and restarting the application. Checkpoint/restart and migration are transparent to MPI applications. Performance evaluation results showed that the additional checkpoint/restart capability has little impact on application performance, and the migration method scales well on a large number of nodes.