Transparent mutable replay for multicore debugging and patch validation

  • Authors:
  • Nicolas Viennot;Siddharth Nair;Jason Nieh

  • Affiliations:
  • Columbia University, New York, NY, USA;Columbia University, New York, NY, USA;Columbia University, New York, NY, USA

  • Venue:
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. This feature, not available in previous record-replay systems, enables powerful new functionality. In particular, Dora can help reproduce, diagnose, and fix software bugs by replaying a version of a recorded application that is recompiled with debugging information, reconfigured to produce verbose log output, modified to include additional print statements, or patched to fix a bug. Dora uses lightweight operating system mechanisms to record an application execution by capturing nondeterministic events to a log without imposing unnecessary timing and ordering constraints. It replays the log using a modified version of the application even in the presence of added, deleted, or modified operations that do not match events in the log. Dora searches for a replay that minimizes differences between the log and the replayed execution of the modified program. If there are no modifications, Dora provides deterministic replay of the unmodified program. We have implemented a Linux prototype which provides transparent mutable replay without recompiling or relinking applications. We show that Dora is useful for reproducing, diagnosing, and fixing software bugs in real-world applications, including Apache and MySQL. Our results show that Dora (1) captures bugs and replays them with applications modified or reconfigured to produce additional debugging output for root cause diagnosis, (2) captures exploits and replays them with patched applications to validate that the patches successfully eliminate vulnerabilities, (3) records production workloads and replays them with patched applications to validate patches with realistic workloads, and (4) maintains low recording overhead on commodity multicore hardware, making it suitable for production systems.