SYNCHRONIZING CLOCKS IN A DISTRIBUTED SYSTEM

  • Authors:
  • J. Lundelius;Jennifer L Welch

  • Affiliations:
  • -;-

  • Venue:
  • SYNCHRONIZING CLOCKS IN A DISTRIBUTED SYSTEM
  • Year:
  • 1984

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keeping the local times of processes in distributed system synchronized in the presence of arbitrary faults is important in many applications and is an interesting theoretical problem in its own right. In order to be practical, any algorithm to synchronize clocks must be able to deal with process failures and repairs, clock drift, and varying message delivery times, but these conditions complicate the design and analysis of algorithms. In this thesis, a general formal model to describe a system of distributed processes, each of which has its own clock is presented. The processes communicate by sending messages to each other, and they can set timers to cause themselves to take steps at some future times. It is proved that even if the clocks run at a perfect rate and there are no failures, an uncertainty of in the known message delivery time makes it impossible to synchronize the clocks of n processes any more closely than 2 (1 -1/n). A simple algorithm that achieves this bound is given to show that the lower bound is tight. Two fault-tolerant algorithms are presented and analyzed, one to maintain synchronization among processes whose clocks initially are close together, and another to establish synchronization in the first place. Both handle drift in the clock rates, uncertainty in the message delivery time, and arbitrary failure of just under one third of the processes. The maintenance algorithm can be modified to allow a failed process that has been repaired to be reintegrated into the system. A variant of the maintenance algorithm is used to establish the initial synchronization. It was also necessary to design an interface between the two algorithms since we envision the processes running the start-up algorithm until the desired degree of synchronization is obtained, and then switching to the maintenance algorithm.