CLIP: a checkpointing tool for message-passing parallel programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Consistency Issues in Distributed Checkpoints
IEEE Transactions on Software Engineering
CoCheck: Checkpointing and Process Migration for MPI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
High performance air pollution modeling for a power plant environment
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
IEEE Transactions on Dependable and Secure Computing
Checkpoint and Restart for Distributed Components in XCAT3
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Self adaptivity in Grid computing: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Controller/Precompiler for Portable Checkpointing
IEICE - Transactions on Information and Systems
Evaluating the reliability of computational grids from the end user's point of view
Journal of Systems Architecture: the EUROMICRO Journal
Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
Migol: A fault-tolerant service framework for MPI applications in the grid
Future Generation Computer Systems
On the dynamic resource availability in grids
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Reliability in grid computing systems
Concurrency and Computation: Practice & Experience - A Special Issue from the Open Grid Forum
Interconnect agnostic checkpoint/restart in open MPI
Proceedings of the 18th ACM international symposium on High performance distributed computing
Future Generation Computer Systems
Application and middleware transparent checkpointing with TCKPT on ClusterGrids
Future Generation Computer Systems
MPI support on opportunistic grids based on the InteGrade middleware
Concurrency and Computation: Practice & Experience - Advanced Scheduling Strategies and Grid Programming Environments
CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Application-Level checkpointing techniques for parallel programs
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
enhancing fault-tolerance of large-scale MPI scientific applications
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
In recent years there has been a significant effort to develop middleware that facilitates the execution of applications on Grid infrastructures. However, support for fault-tolerant execution continues to be scarce. The CPPC-G framework is a service-based architecture designed to provide efficient fault-tolerant mechanisms for the execution of sequential and parallel applications on grids. Applications to be managed by CPPC-G are expected to be preprocessed with CPPC (ComPiler for Portable Checkpointing), a tool for automatically inserting portable checkpoint instrumentation into the code of parallel applications. Built on top of existing Globus services, CPPC-G services are in charge of submitting and monitoring CPPC applications, managing generated checkpoint files, detecting failures and automatically restarting failed executions. In this paper the feasibility of this approach is assessed by measuring the performance of CPPC-G, quantitatively addressing its impact on application performance. Results show that the increase in overall throughput and availability comes with minor performance degradation.