Finding missing synchronization in a distributed computation using controlled re-execution

  • Authors:
  • Neeraj Mittal;Vijay K. Garg

  • Affiliations:
  • Department of Computer Science, The University of Texas at Dallas, Richardson, TX;Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX

  • Venue:
  • Distributed Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Correct distributed programs are hard to write. Not surprisingly, distributed systems are especially vulnerable to software faults. Testing and debugging is an important way to improve the reliability of distributed systems. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem of predicate control, which takes a computation and a safety property specified on the computation as inputs, and produces a controlled computation, with added synchronization, that maintains the given safety property as output. We devise efficient control algorithms for two classes of useful predicates, namely region predicates and disjunctive predicates. For the former, we prove that the control algorithm is optimal in the sense that it guarantees maximum concurrency possible in the controlled computation. For the latter, we prove that our control algorithm generates the least number of synchronization dependencies and therefore has optimal message-complexity. Furthermore, we provide a necessary and sufficient condition under which it is possible to efficiently compute a minimal controlling synchronization for a general predicate. We also give an algorithm to compute such a synchronization under the condition provided.