An Experimental Study about Diskless Checkpointing

  • Authors:
  • Luís M. Silva;João Gabriel Silva

  • Affiliations:
  • -;-

  • Venue:
  • EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Checkpointing and rollback-recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some disk files. However, insome situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of mainmemory to maintain the checkpoint data.This paper presents two main-memory checkpointing schemes that can be used in anyparallel machine without requiring any change to the hardware: one scheme saves thecheckpoints in the memory of other processors, while the other is based on a parity approach.Both techniques have been implemented and evaluated in a commercial parallel machine.Some conclusions have been taken that clearly show the superiority of one of those schemes.