ODR: output-deterministic replay for multicore debugging

  • Authors:
  • Gautam Altekar;Ion Stoica

  • Affiliations:
  • UC Berkeley, Berkeley, USA;UC Berkeley, Berkeley, USA

  • Venue:
  • Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races -- a source of nondeterminism that must be captured if executions are to be faithfully reproduced. In this paper, we present ODR--a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead.