Live Debugging of Distributed Systems

  • Authors:
  • Darren Dao;Jeannie Albrecht;Charles Killian;Amin Vahdat

  • Affiliations:
  • University of California, San Diego, La Jolla;Williams College, Williamstown;Purdue University, West Lafayette;University of California, San Diego, La Jolla

  • Venue:
  • CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Debugging distributed systems is challenging. Although incremental debugging during development finds some bugs, developers are rarely able to fully test their systems under realistic operating conditions prior to deployment. While deploying a system exposes it to realistic conditions, debugging requires the developer to: (i) detect a bug, (ii) gather the system state necessary for diagnosis, and (iii) sift through the gathered state to determine a root cause. In this paper, we present MaceODB, a tool to assist programmers with debugging deployed distributed systems. Programmers define a set of runtime properties for their system, which MaceODB checks for violations during execution. Once MaceODB detects a violation, it provides the programmer with the information to determine its root cause. We have been able to diagnose several non-trivial bugs in existing mature distributed systems using MaceODB; we discuss two of these bugs in this paper. Benchmarks indicate that the approach has low overhead and is suitable for in situ debugging of deployed systems.