Re-execution of Distributed Programs to Detect Bugs Hidden by Racing

Authors:
Affiliations:
Venue:
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Software Technology and Architecture - Volume 1
Year:
1997

Citing 0
Cited 6

Debugging distributed programs using controlled re-execution

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Debugging in a Distributed World: Observation and Control

ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
A debugger for flow graph based parallel applications

Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
Dynamic testing of flow graph based parallel applications

PADTAD '08 Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging
Robust non-intrusive record-replay with processor extraction

Proceedings of the 8th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Detecting unaffected message races in parallel programs

GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding errors in non-deterministic programs is complicatedby the fact that an anomaly may occur during oneprogram execution, and not the next. Our objective is toprovide a practical yet powerful testing environment fordistributed systems, using re-execution. We focus on re-executing the program, under a strictly different messageordering. We show that messages are grouped into waves,such that any two messages from different waves must alwaysbe received in the same order. We provide an algorithmthat produces a re-execution that maximizes the numberof reordered pairs of message delivery events. We alsoprovide an efficient online algorithm for detecting racingmessages.