Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Network-based concurrent computing on the PVM system
Concurrency: Practice and Experience
Optimal tracing and replay for debugging message-passing parallel programs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Panorama: a portable, extensible parallel debugger
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Execution Replay: A Mechanism for Integrating a Visualization Tool with a Symbolic Debugger
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
MPL*: Efficient Record/Play of Nondeterministic Features of Message Passing Libraries
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Systematic Assessment of the Overhead of Tracing Parallel Programs
PDP '96 Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)
Debugging Large-Scale, Long-Running Parallel Programs
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Notes on Nondeterminism in Message Passing Programs
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Deterministic replay for MCAPI programs
Proceedings of the Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Deterministic replay for message-passing-based concurrent programs
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on verification challenges in the concurrent world
Retrospect: deterministic replay of MPI applications for interactive distributed debugging
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
Nondeterminism is a characteristic of many parallel programs that needs dedicated support from analysis tools and programming environments. In order to allow cyclic debugging of such programs, record&replay mechanisms are used most frequently. Such techniques operate in two phases, where the record phase traces a program's execution that can be arbitrarily repeated during subsequent replay phases. In contrast to most existing approaches, this paper describes a mechanism that is transparently integrated in the underlying message passing interface. The main advantage of this approach is its omnipresence, such that a program's execution can be repeated immediately after it has been observed. Other benefits are the lack of instrumentation and a corresponding simplification of the whole technique for inexperienced users. The difficulties addressed by this approach are concerned with the amount of monitor overhead, which must neither perturb the program's execution nor generate huge amounts of trace data.