CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications

  • Authors:
  • Gabriel Rodríguez;María J. Martín;Patricia González;Juan Touriño;Ramón Doallo

  • Affiliations:
  • Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Spain;Computer Architecture Group, Department of Electronics and Systems, University of A Coruña, Spain

  • Venue:
  • Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the evolution of high-performance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the execution or to a migration of the application processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. However, some of the data manipulated by a parallel application are not truly portable. Examples of these include opaque state (e.g. data structures for communications support) or diversity of interfaces for a single feature (e.g. communications, I-O). Directly manipulating the underlying ad hoc representations renders checkpointing tools unable to work on different environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information such as what data need to be stored, where to store them, or where to checkpoint. CPPC (ComPiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency. It is made up of a library and a compiler. The CPPC library contains routines for variable level checkpointing, using portable code and protocols. The CPPC compiler helps to achieve transparency by relieving the user from time-consuming tasks, such as data flow and communications analyses and adding instrumentation code. This paper covers both the operation of the CPPC library and its compiler support. Experimental results using benchmarks and large-scale real applications are included, demonstrating usability, efficiency, and portability. Copyright © 2009 John Wiley & Sons, Ltd.