Model-based performance evaluation of distributed checkpointing protocols

  • Authors:
  • Adnan Agbaria;Roy Friedman

  • Affiliations:
  • IBM Haifa Research Lab, Mount Carmel, Haifa 31905, Israel;Computer Science Department, Technion - Israel Institute of Technology, Haifa 32000, Israel

  • Venue:
  • Performance Evaluation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large number of distributed checkpointing protocols have appeared in the literature. However, to make informed decisions about which protocol performs best for a given environment, one must use an objective measure for comparing them. Obviously, a distributed checkpointing protocol could be the best in a specific environment, but not in another environment. This paper presents an objective measure, called overhead ratio, for evaluating distributed checkpointing protocols. This measure extends previous evaluation schemes by incorporating several additional parameters that are inherent in distributed environments. In particular, we take into account the rollback propagation of the protocol, which impacts the length of the recovery process, and therefore the expected program run-time in executions that involve failures and recoveries. Using the objective measure as an evaluation technique, the paper also analyses several known protocols and compares their overhead ratios.