Portable checkpointing and recovery

  • Authors:
  • L. M. Silva;J. G. Silva;S. Chapple;L. Clarke

  • Affiliations:
  • -;-;-;-

  • Venue:
  • HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a checkpointing scheme that was implemented in a parallel library that runs on top of CHIMP/MPI. The main goals of the checkpointing mechanism are portability and efficiency. It runs on every platform supported by MPI in a machine-independent way. The scheme allows the migration of checkpoints and offers a flexible recovery mechanism based on data-reconfiguration. Some performance results will be presented at the end of the paper together with some techniques that can be used to increase the efficiency of the checkpointing mechanism.