A survey of rollback-recovery protocols in message-passing systems

  • Authors:
  • E. N. (Mootaz) Elnozahy;Lorenzo Alvisi;Yi-Min Wang;David B. Johnson

  • Affiliations:
  • IBM Research, Austin, TX;The University of Texas at Austin, Austin, TX;Microsoft Research, Redmond, WA;Rice University, Houston, TX

  • Venue:
  • ACM Computing Surveys (CSUR)
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based. Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback-recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols.