An asynchronous recovery algorithm based on a staggered quasi-synchronous checkpointing algorithm

  • Authors:
  • D. Manivannan;Q. Jiang;J. Yang;K. E. Persson;M. Singhal

  • Affiliations:
  • Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY;Computer Science Department, University of Kentucky, Lexington, KY

  • Venue:
  • IWDC'05 Proceedings of the 7th international conference on Distributed Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Checkpointing and rollback recovery are established techniques for handling failures in distributed systems. Under synchronous checkpointing, each process involved in the distributed computation takes checkpoint almost simultaneously. This causes contention for network stable storage and hence degrades performance. To overcome this problem, checkpoint staggering under which checkpoints by various processes are taken in a staggered manner, has been proposed. In this paper, we propose a staggered quasi-synchronous checkpointing algorithm which reduces contention for network stable storage without any synchronization overhead. We also present an asynchronous recovery algorithm based on the checkpointing algorithm.