Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Optimal tracing and replay for debugging message-passing parallel programs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Formal and Experimental Validation of a Low Overhead Execution Replay Mechanism
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Athapascan Runtime: Efficiency for Irregular Problems
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Debugging Point-to-Point Communication in MPI an PVM
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Shortcut Replay: A Replay Technique for Debugging Long-Running Parallel Programs
ASIAN '02 Proceedings of the7th Asian Computing Science Conference on Advances in Computing Science: Internet Computing and Modeling, Grid Computing, Peer-to-Peer Computing, and Cluster
An Integrated Record&Replay Mechanism for Nondeterministic Message Passing Programs
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Retrospect: deterministic replay of MPI applications for interactive distributed debugging
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.01 |
A major source of problems when debugging message passing programs is the nondeterministic behavior of the promiscuous receive and nonblocking test operations. This prohibits the use of cyclic debugging techniques because the intrusion caused by a debugger is often large enough to change the order in which processes interact. This paper describes the solutions we propose to efficiently record and replay the nondeterministic features of message passing libraries (MPL) like MPI or PVM. It turns out that for promiscuous receive operations it is sufficient to keep track of the sender of the message, and for nonblocking test-operations to keep track of the number of failed tests. The proposed solutions have been implemented for an existing MPI-library, and performance measurements reveal that the time overhead of both record and replay executions is very low with respect to the (nondeterministic) original execution while the size of the log files remains very small.