Checkpointing an Recovery of Share Memory Parallel Applications in a Cluster

  • Authors:
  • Ramamurthy Badrinath;Christine Morin;Geoffroy Vallée

  • Affiliations:
  • -;-;-

  • Venue:
  • CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of differentcheckpointing and recovery protocols. It is designed to support common optimizations for performance suggested inliterature, while staying light-weight during fault-free execution. We also present preliminary performance results ofthe current implementation.