Debugging distributed programs using controlled re-execution
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Debugging in a Distributed World: Observation and Control
ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
A debugger for flow graph based parallel applications
Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
Dynamic testing of flow graph based parallel applications
PADTAD '08 Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging
Robust non-intrusive record-replay with processor extraction
Proceedings of the 8th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Detecting unaffected message races in parallel programs
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Hi-index | 0.00 |
Finding errors in non-deterministic programs is complicatedby the fact that an anomaly may occur during oneprogram execution, and not the next. Our objective is toprovide a practical yet powerful testing environment fordistributed systems, using re-execution. We focus on re-executing the program, under a strictly different messageordering. We show that messages are grouped into waves,such that any two messages from different waves must alwaysbe received in the same order. We provide an algorithmthat produces a re-execution that maximizes the numberof reordered pairs of message delivery events. We alsoprovide an efficient online algorithm for detecting racingmessages.