Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Recovery in distributed systems using asynchronous message logging and checkpointing
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Debugging distributed C programs by real time reply
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Partial orders for parallel debugging
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Efficient execution replay technique for distributed memory architectures
EDMCC2 Proceedings of the 2nd European conference on Distributed memory computing
Restoring consistent global states of distributed computations
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Optimal tracing and replay for debugging message-passing parallel programs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Adaptive message logging for incremental replay of message-passing programs
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Support for Software Interrupts in Log-Based Rollback-Recovery
IEEE Transactions on Computers
Shortcut Replay: A Replay Technique for Debugging Long-Running Parallel Programs
ASIAN '02 Proceedings of the7th Asian Computing Science Conference on Advances in Computing Science: Internet Computing and Modeling, Grid Computing, Peer-to-Peer Computing, and Cluster
Performing replay in an OSF DCE environment
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Hi-index | 0.00 |
Adaptive message logging, which traces dependences between messages and checkpoints and selectively logs messages, letting users accurately and efficiently replay specific portions of parallel programs, is presented. Traces are reduced by logging only messages that cannot be quickly recomputed during replay. By restarting the execution at the right set of checkpoints, many of the messages needed for a specific replay can be recomputed during the replay itself.