Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System

  • Authors:
  • Parameswaran Ramanathan;Kang G. Shin

  • Affiliations:
  • Univ. of Wisconsin-Madison, Madison;The Univ. of Michigan, Ann Arbor

  • Venue:
  • IEEE Transactions on Software Engineering
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. A common time base is established in the system using a hardware clock synchronization algorithm. This common time base is coupled with the idea of pseudo-recovery points to develop a checkpointing algorithm that has the following advantages: reduced wait for commitment for establishing recovery lines, fewer messages to be exchanged, and less memory requirement. These advantages are assessed quantitatively by developing a probabilistic model.