Novel Crash Recovery Approach for Concurrent Failures in Cluster Federation

  • Authors:
  • Bidyut Gupta;Shahram Rahimi

  • Affiliations:
  • Department of Computer Science, Southern Illinois University, Carbondale, USA IL 62901;Department of Computer Science, Southern Illinois University, Carbondale, USA IL 62901

  • Venue:
  • GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we have proposed a simple and efficient approach for check pointing and recovery in cluster computing environment. The recovery scheme deals with both orphan and lost intra and inter cluster messages. This check pointing scheme ensures that after the system recovers from failures, all processes in different clusters can restart from their respective recent checkpoints; thus avoiding any domino effect. That is, the recent check points always form a consistent recovery line of the cluster federation. The main features of our work are: it uses selective message logging which enables the initiator process in each cluster to log the minimum number of messages, the recovery scheme is domino effect free and is executed simultaneously by all clusters in the cluster federation, it considers concurrent failures, message complexities in each cluster for both check pointing and recovery schemes are just O (n), where n is the number of processes in a cluster.These features make our algorithm superior to the existing works.