Self-stabilizing algorithm for checkpointing in a distributed system

  • Authors:
  • Partha Sarathi Mandal;Krishnendu Mukhopadhyaya

  • Affiliations:
  • Advanced Computing and Microelectronics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700 108, India;Advanced Computing and Microelectronics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700 108, India

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detecting and correcting, checkpointing, and recovery algorithms are proposed in a ring topology. The proposed data fault detection and correction algorithms can handle data faults; at most one per process, but in any number of processes. The proposed checkpointing algorithm can deal with concurrent multiple initiations of checkpointing and data faults. A process can recover from a fault, using the proposed recovery algorithm in spite of multiple data faults present in the system. All the proposed algorithms converge in O(n) steps, where n is the number of processes. The algorithm can be extended to work for general topologies too.