Application level fault tolerance in heterogeneous networks of workstations
Journal of Parallel and Distributed Computing
Heterogeneous process migration: the Tui system
Software—Practice & Experience
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
Data collection and restoration for heterogenenous process migration
Software—Practice & Experience
CoCheck: Checkpointing and Process Migration for MPI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Collective operations in application-level fault-tolerant MPI
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Portable Checkpointing for Heterogeneous Archtitectures
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Process/Thread Migration and Checkpointing in Heterogeneous Distributed Systems
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 9 - Volume 9
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The cactus framework and toolkit: design and applications
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Experimental evaluation of application-level checkpointing for OpenMP programs
Proceedings of the 20th annual international conference on Supercomputing
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Recent advances in checkpoint/recovery systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Strategies for Rescheduling Tightly-Coupled Parallel Applications in Multi-Cluster Grids
Journal of Grid Computing
Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids
Journal of Grid Computing
Execution migration in a heterogeneous-ISA chip multiprocessor
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
Utility computing is becoming a popular way of exploiting the potential of computational grids. In utility computing, users are provided with computational power in a transparent manner similar to the way in which electrical utilities supply power to their customers. To take full advantage of utility computing, an application needs to be mobile; that is, it needs to be able to migrate between heterogeneous computing platforms while it is executing. Further, it needs to be able to adapt to the computing resources at each site, such as the number of available physical processors. At present, there are few high-performance computing applications of this sort, and re-engineering legacy codes to be mobile can take enormous effort.In this paper, we describe theph$PC^3$ system, which converts C/MPI codes into mobile programs almost transparently. Because it is based on portable application-level checkpointing, it enables the state of running applications to be saved so that the application can be restarted on different architectures, operating systems and MPI implementations. Moreover, the number of processors on these machines can be different. To our knowledge, this is the first system to provide all these features. Experimental results show that the overhead introduced by the system is usually small.