Compiler-enhanced incremental checkpointing for OpenMP applications

Authors:
Greg Bronevetsky;Daniel J. Marques;Keshav K. Pingali;Radu Rugina;Sally A. McKee
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA, USA;University of Texas at Austin, Austin, TX, USA;University of Texas at Austin, Austin, TX, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Year:
2008

Citing 2
Cited 5

Efficient application migration under compiler guidance

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A large-scale study of failures in high-performance computing systems

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks

Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems

ACM Transactions on Architecture and Code Optimization (TACO)
Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating incremental checkpointing for extreme-scale computing

Future Generation Computer Systems
Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

As modern supercomputing systems reach peta-flop performance they grow in both size and complexity, becoming increasingly vulnerable to failures. Checkpointing is a popular technique for tolerating such failures. Although a variety of automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing by presenting a compiler analysis for incremental checkpointing. This analysis, which works with both sequential and OpenMP applications, significantly reduces checkpoint sizes and enables asynchronous checkpointing.