Methods for Observing Global Properties in Distributed Systems

  • Authors:
  • Vijay K. Garg

  • Affiliations:
  • -

  • Venue:
  • IEEE Parallel & Distributed Technology: Systems & Technology
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fundamental problem in developing distributed software is that no process has access to the global state. Thus, computing a global predicate or function-a need that occurs frequently in many distributed systems-typically requires significant programming. Being able to observe a distributed computation is useful for many fundamental problems in distributed software, such as debugging, testing, and fault-tolerance. After a program is debugged and tested, it must be monitored for fault-tolerance, again requiring something that will observe the global state. Finally, the ability to observe global predicates generalizes algorithms for many previous problems such as detecting program termination, token loss, and deadlock. Research on how to detect global predicates has yielded three sets of algorithms. In the global snapshot algorithm, global snap-shots of the computation are repeatedly computed until the desired predicate becomes true. However, this approach works only for stable predicates like deadlock and termination, which do not turn false once they become true. In the second set of algorithms, a lattice of global states is constructed. Unlike the global snapshot approach, this approach lets users detect unstable predicates. However, it can mean exploring a prohibitive number of global states. This article surveys algorithms that use a third approach, which exploits the structure of the predicate, but does not build a lattice. Instead, they examine the computation itself to deduce if a predicate became true. These algorithms are computation ally efficient and can be used to detect even unstable predicates.