IGOR: a system for program debugging via reversible execution
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Supporting reverse execution for parallel programs
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
The Bugnet distributed debugging system
EW 2 Proceedings of the 2nd workshop on Making distributed systems work
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
An Execution-Backtracking Approach to Debugging
IEEE Software
A Standard Interface for Debugger Access to Message Queue Information in MPI
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Fifteen years of psychology in software engineering: Individual differences and cognitive science
ICSE '84 Proceedings of the 7th international conference on Software engineering
Design and Implementation of Multiple Fault-Tolerant MPI over Myrinet (M^3)
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Debugging operating systems with time-traveling virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Interconnect agnostic checkpoint/restart in open MPI
Proceedings of the 18th ACM international symposium on High performance distributed computing
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Retrospect: deterministic replay of MPI applications for interactive distributed debugging
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Bug Locating Method for the Debugging of Parallel Discrete Event Simulation
PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
Techniques for efficient in-memory checkpointing
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Hi-index | 0.00 |
Debugging is often the most time consuming part of software development. HPC applications prolong the debugging process by adding more processes interacting in dynamic ways for longer periods of time. Checkpoint/restart-enabled parallel debugging returns the developer to an intermediate state closer to the bug. This focuses the debugging process, saving developers considerable amounts of time, but requires parallel debuggers cooperating with MPI implementations and checkpointers. This paper presents a design specification for such a cooperative relationship. Additionally, this paper discusses the application of this design to the GDB and DDT debuggers, Open MPI, and BLCR projects.