Efficient application migration under compiler guidance
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A large-scale study of failures in high-performance computing systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
ACM Transactions on Architecture and Code Optimization (TACO)
Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating incremental checkpointing for extreme-scale computing
Future Generation Computer Systems
Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
As modern supercomputing systems reach peta-flop performance they grow in both size and complexity, becoming increasingly vulnerable to failures. Checkpointing is a popular technique for tolerating such failures. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing by presenting a compiler analysis for incremental checkpointing. This analysis, which works with both sequential and OpenMP applications, significantly reduces checkpoint sizes and enables asynchronous checkpointing.