Optimal tracing and replay for debugging message-passing parallel programs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Internet indirection infrastructure
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Deterministic Replay of Distributed Java Applications
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
Jockey: a user-space library for record-replay debugging
Proceedings of the sixth international symposium on Automated analysis-driven debugging
On the design of a pervasive debugger
Proceedings of the sixth international symposium on Automated analysis-driven debugging
Debugging operating systems with time-traveling virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Non-transitive connectivity and DHTs
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
OCALA: an architecture for supporting legacy applications over overlays
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Using deterministic replay for debugging of distributed real-time systems
Euromicro-RTS'00 Proceedings of the 12th Euromicro conference on Real-time systems
Using queries for distributed monitoring and forensics
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Correlating multi-session attacks via replay
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
ReCrash: Making Software Failures Reproducible by Preserving Object States
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
FlashBox: a system for logging non-deterministic events in deployed embedded systems
Proceedings of the 2009 ACM symposium on Applied Computing
ReCrashJ: a tool for capturing and reproducing program crashes in deployed applications
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
DSF: a common platform for distributed systems research and development
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Lightweight tracing for wireless sensor networks debugging
Proceedings of the 4th International Workshop on Middleware Tools, Services and Run-Time Support for Sensor Networks
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
Transparent, lightweight application execution replay on commodity multiprocessor operating systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DSF: a common platform for distributed systems research and development
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Mugshot: deterministic capture and replay for Javascript applications
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
R2: an application-level kernel for record and replay
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Language-based replay via data flow cut
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Focus replay debugging effort on the control plane
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
Correlating multi-session attacks via replay
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
OFRewind: enabling record and replay troubleshooting for networks
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Partial replay of long-running applications
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
DejaVu: integrated support for developing interactive camera-based programs
Proceedings of the 25th annual ACM symposium on User interface software and technology
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Retrospect: deterministic replay of MPI applications for interactive distributed debugging
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Predicting method crashes with bytecode operations
Proceedings of the 6th India Software Engineering Conference
Chronicler: lightweight recording to reproduce field failures
Proceedings of the 2013 International Conference on Software Engineering
On fault resilience of OpenStack
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
We have developed a new replay debugging tool, liblog, for distributed C/C++ applications. It logs the execution of deployed application processes and replays them deterministically, faithfully reproducing race conditions and non-deterministic failures, enabling careful offline analysis. To our knowledge, liblog is the first replay tool to address the requirements of large distributed systems: lightweight support for long-running programs, consistent replay of arbitrary subsets of application nodes, and operation in a mixed environment of logging and nonlogging processes. In addition, it requires no special hardware or kernel patches, supports unmodified application executables, and integrates GDB into the replay mechanism for simultaneous source-level debugging of multiple processes. This paper presents liblog's design, an evaluation of its runtime overhead, and a discussion of our experience with the tool to date.