Replay debugging for distributed applications

  • Authors:
  • Dennis Geels;Gautam Altekar;Scott Shenker;Ion Stoica

  • Affiliations:
  • University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley

  • Venue:
  • ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have developed a new replay debugging tool, liblog, for distributed C/C++ applications. It logs the execution of deployed application processes and replays them deterministically, faithfully reproducing race conditions and non-deterministic failures, enabling careful offline analysis. To our knowledge, liblog is the first replay tool to address the requirements of large distributed systems: lightweight support for long-running programs, consistent replay of arbitrary subsets of application nodes, and operation in a mixed environment of logging and nonlogging processes. In addition, it requires no special hardware or kernel patches, supports unmodified application executables, and integrates GDB into the replay mechanism for simultaneous source-level debugging of multiple processes. This paper presents liblog's design, an evaluation of its runtime overhead, and a discussion of our experience with the tool to date.