Performing replay in an OSF DCE environment

Authors:
Yuh Ming Yong;David J. Taylor
Affiliations:
IBM Canada Ltd., 22/653, 844 Don Mills Road, North York, Ontario M3C 1V7;Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1
Venue:
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Year:
1995

Citing 6
Cited 1

Debugging Parallel Programs with Instant Replay

IEEE Transactions on Computers
Introduction to OSF DCE (rev. 1.0)

Introduction to OSF DCE (rev. 1.0)
Adaptive Message Logging for Incremental Program Replay

IEEE Parallel & Distributed Technology: Systems & Technology
Integrating real-time and partial-order information in event-data displays

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
A prototype debugger for Hermes

CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
The use of process clustering in distributed-system event displays

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1

Single stepping in event-visualization tools

CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Debugging a distributed application is inherently difficult because such factors as network delay and varying system loads may cause the behaviour of the application to change from one execution to another. Using a standard debugger with such an application is also likely to perturb its behaviour sufficiently that some bugs will not be manifested when the debugger is used. Fortunately, a replay mechanism can be used to circumvent these problems. Replaying a program involves a monitoring phase, during which the behaviour of the program is logged, and a replay phase, during which the behaviour of the program is coerced to follow the partial order logged during the monitoring phase.In this paper,1we describe an implementation of the replay technique for use with the Open Software Foundation's Distributed Computing Environment (OSF DCE). OSF DCE presents a special problem in that servers are multithreaded and incoming RPCs are assigned dynamically to threads. To preserve the event partial order, it is necessary to ensure that an RPC uses the same thread on replay. This difficulty is dealt with by ensuring that the DCE thread service sees the same resource state during replay and hence makes the same threadallocation decision.