Communications of the ACM
Generative programming: methods, tools, and applications
Generative programming: methods, tools, and applications
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance
The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
Compile/Run-Time Support for Thread Migration
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Portable Checkpointing for Heterogeneous Archtitectures
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Using AspectJ to separate concerns in parallel scientific Java code
Proceedings of the 3rd international conference on Aspect-oriented software development
Application-level checkpointing for shared memory programs
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Checkpoint and Restart for Distributed Components in XCAT3
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
When and how to develop domain-specific languages
ACM Computing Surveys (CSUR)
Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
Optimization of checkpointing-related I/O for high-performance parallel and distributed computing
The Journal of Supercomputing
A Domain-Specific Language for Application-Level Checkpointing
ICDCIT '08 Proceedings of the 5th International Conference on Distributed Computing and Internet Technology
Developing scientific applications using Generative Programming
SECSE '09 Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering
New user-guided and ckpt-based checkpointing libraries for parallel MPI applications,
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Region-based image clustering and retrieval using multiple instance learning
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
MoDELS'05 Proceedings of the 2005 international conference on Satellite Events at the MoDELS
Application-Level checkpointing techniques for parallel programs
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
Raising the level of abstraction for developing message passing applications
The Journal of Supercomputing
Hi-index | 0.00 |
One of the key elements required for writing self-healing applications for distributed and dynamic computing environments is checkpointing. Checkpointing is a mechanism by which an application is made resilient to failures by storing its state periodically to the disk. The main goal of this research is to enable non-invasive reengineering of existing applications to insert Application-Level Checkpointing (ALC) mechanism. The Domain-Specific Language (DSL) developed in this research serves as a perfect means towards this end and is used for obtaining the ALC-specifications from the end-users. These specifications are used for generating and inserting the actual checkpointing code into the existing application. The performance of the application having the generated checkpointing code is comparable to the performance of the application in which the checkpointing code was inserted manually. With slight modifications, the DSL developed in this research can be used for specifying the ALC mechanism in several base languages (e.g., C/C++, Java, and FORTRAN).