Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Optimal tracing and incremental reexecution for debugging long-running programs
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
LCLint: a tool for using specifications to check code
SIGSOFT '94 Proceedings of the 2nd ACM SIGSOFT symposium on Foundations of software engineering
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
An Overview of Checkpointing in Uniprocessor and DistributedSystems, Focusing on Implementation and Performance
The design and implementation of Zap: a system for migrating computing environments
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Jockey: a user-space library for record-replay debugging
Proceedings of the sixth international symposium on Automated analysis-driven debugging
Recording shared memory dependencies using strata
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Fault Tolerance in Multiprocessor Systems Via Application Cloning
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Transparent checkpoint-restart of multiple processes on commodity operating systems
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Decoupling dynamic program analysis from execution in virtual environments
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
ASSURE: automatic software self-healing using rescue points
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Automatically patching errors in deployed software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Multi-stage replay with crosscut
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Finding and reproducing Heisenbugs in concurrent programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving Software Diagnosability via Log Enhancement
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
Be conservative: enhancing failure diagnosis with proactive logging
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
RaceFree: an efficient multi-threading model for determinism
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Transparent mutable replay for multicore debugging and patch validation
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Techniques for efficient in-memory checkpointing
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Efficient deterministic multithreading without global barriers
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. To address this problem, we present Transplay, a system that captures production software bugs into small per-bug recordings which are used to reproduce the bugs on a completely different operating system without access to any of the original software used in the production environment. Transplay introduces partial checkpointing, a new mechanism that efficiently captures the partial state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries, libraries, support data, or the original execution environment. Transplay integrates with existing debuggers to provide standard debugging facilities to allow the user to examine the contents of variables and other program state at each source line of the application's replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that Transplay can reproduce real bugs and be used in production with modest recording overhead.