Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Introduction to OSF DCE (rev. 1.0)
Introduction to OSF DCE (rev. 1.0)
Adaptive Message Logging for Incremental Program Replay
IEEE Parallel & Distributed Technology: Systems & Technology
Integrating real-time and partial-order information in event-data displays
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
A prototype debugger for Hermes
CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
The use of process clustering in distributed-system event displays
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
Single stepping in event-visualization tools
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Hi-index | 0.00 |
Debugging a distributed application is inherently difficult because such factors as network delay and varying system loads may cause the behaviour of the application to change from one execution to another. Using a standard debugger with such an application is also likely to perturb its behaviour sufficiently that some bugs will not be manifested when the debugger is used. Fortunately, a replay mechanism can be used to circumvent these problems. Replaying a program involves a monitoring phase, during which the behaviour of the program is logged, and a replay phase, during which the behaviour of the program is coerced to follow the partial order logged during the monitoring phase.In this paper,1we describe an implementation of the replay technique for use with the Open Software Foundation's Distributed Computing Environment (OSF DCE). OSF DCE presents a special problem in that servers are multithreaded and incoming RPCs are assigned dynamically to threads. To preserve the event partial order, it is necessary to ensure that an RPC uses the same thread on replay. This difficulty is dealt with by ensuring that the DCE thread service sees the same resource state during replay and hence makes the same threadallocation decision.