Using queries for distributed monitoring and forensics

  • Authors:
  • Atul Singh;Petros Maniatis;Timothy Roscoe;Peter Druschel

  • Affiliations:
  • Rice University;Intel Research Berkeley;Intel Research Berkeley;Rice University

  • Venue:
  • Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.