A probe effect in concurrent programs
Software—Practice & Experience
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
ACM Computing Surveys (CSUR)
Techniques for debugging parallel programs with flowback analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Process clustering for distributed debugging
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Relative debugging and its application to the development of large numerical models
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Software reliability via run-time result-checking
Journal of the ACM (JACM)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
An Execution-Backtracking Approach to Debugging
IEEE Software
Fundamentals of Distributed System Observation
IEEE Software
Debugging OpenMP Programs Using Event Manipulation
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Debugging Point-to-Point Communication in MPI an PVM
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
EXDAMS: extendable debugging and monitoring system
AFIPS '69 (Spring) Proceedings of the May 14-16, 1969, spring joint computer conference
A debugger for flow graph based parallel applications
Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging
Hi-index | 0.00 |
Using cyclic debugging techniques for large-scale parallel programs is often prohibited by a program's size and runtime and the associated computing costs. The hitherto conventional solution of down-scaling the number of processes and/or the problem size is only partially satisfying, because the program's behavior may differ significantly compared to the original scale and some errors may even vanish completely. A different solution is offered by process isolation, which allows to extract and execute single processes of an arbitrary parallel program. By simulating the surroundings of the selected process, this method pretends the program's behavior on full scale. Combined with a grouping strategy, it allows to reduce the number of actually running processes to a small group instead of a single process. This enables the application of well-established debugging tools and their functionality for error detection with manageable numbers of processes. In addition, the grouping mechanism allows to adjust the size of the traces needed to simulate process interaction.