A framework for automatic identification of the best checkpoint and recovery protocol

  • Authors:
  • Himadri S. Paul;Arobinda Gupta;Amit Sharma

  • Affiliations:
  • Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India;Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India;Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois

  • Venue:
  • IWDC'04 Proceedings of the 6th international conference on Distributed Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault tolerance is important for a distributed system to increase its reliability and throughput. Checkpoint and recovery protocols have been proposed as fault tolerance for non-critical applications. The performance of checkpoint and recovery protocols plays an important role in the overall performance of a distributed system. The performance of these protocols depends on system characteristics as well as an application characteristics. In this paper, we propose a novel technique to automatically identify the checkpoint and recovery protocol which is likely to perform the best for a given system and an application the system is currently running. We present experimental results to show that the scheme can efficiently determine a suitable checkpoint and recovery protocol for many applications.