Domino-effect free crash recovery for concurrent failures in cluster federation

  • Authors:
  • Bidyut Gupta;Shahram Rahimi;Vineel Allam;Vamshi Jupally

  • Affiliations:
  • Computer Science Department, Southern Illinois University, Carbondale, IL;Computer Science Department, Southern Illinois University, Carbondale, IL;Computer Science Department, Southern Illinois University, Carbondale, IL;Computer Science Department, Southern Illinois University, Carbondale, IL

  • Venue:
  • GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we have addressed the complex problem of recovery for concurrent failures in cluster computing environment. We have proposed a new approach in which we have dealt with both inter cluster orphan and lost messages unlike the existing works. The proposed recovery approach is free from the domino-effect and hence guarantees the least amount of recomputation after recovery. Besides, a process needs to save only its recent local checkpoint, which is also the case for a cluster. So number of trips to stable storage per process is always one during recovery. The proposed common check pointing interval is such that it enables a process to log the minimum number of messages it has sent. These features make our approach superior to the existing works.