Performing replay in an OSF DCE environment

  • Authors:
  • Yuh Ming Yong;David J. Taylor

  • Affiliations:
  • IBM Canada Ltd., 22/653, 844 Don Mills Road, North York, Ontario M3C 1V7;Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1

  • Venue:
  • CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Debugging a distributed application is inherently difficult because such factors as network delay and varying system loads may cause the behaviour of the application to change from one execution to another. Using a standard debugger with such an application is also likely to perturb its behaviour sufficiently that some bugs will not be manifested when the debugger is used. Fortunately, a replay mechanism can be used to circumvent these problems. Replaying a program involves a monitoring phase, during which the behaviour of the program is logged, and a replay phase, during which the behaviour of the program is coerced to follow the partial order logged during the monitoring phase.In this paper,1we describe an implementation of the replay technique for use with the Open Software Foundation's Distributed Computing Environment (OSF DCE). OSF DCE presents a special problem in that servers are multithreaded and incoming RPCs are assigned dynamically to threads. To preserve the event partial order, it is necessary to ensure that an RPC uses the same thread on replay. This difficulty is dealt with by ensuring that the DCE thread service sees the same resource state during replay and hence makes the same threadallocation decision.