Distributed recovery with K-optimistic logging

  • Authors:
  • Om P. Damani;Yi-Min Wang;Vijay K. Garg

  • Affiliations:
  • IBM T.J. Watson Research Center, Hawthorne, NY;Microsoft Research, Redmond, WA;Department of Elect. and Computer Engineering, University of Texas at Austin

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service down-time. Most industrial applications have chosen pessimistic logging because it allows fast and localized recovery. The price that they must pay, however, is the high failure-free overhead. In this paper, we introduce the concept of K-optimistic logging where K is the degree of optimism that can be used to fine-tune the trade-off between failure-free overhead and recovery efficiency. Traditional pessimistic logging and optimistic logging then become the two extremes in the entire spectrum spanned by K-optimistic logging. Our results generalize several previously known protocols.Our approach is to prove that only dependencies on those states that may be lost upon a failure need to be tracked on-line, and so transitive dependency tracking can be performed with a variable-size vector. The size of the vector piggy-backed on a message then indicates the number of processes whose failures may revoke the message, and K corresponds to the upper bound on the vector size. Furthermore, the parameter K is dynamically tunable in response to changing system characteristics.