Using queries for distributed monitoring and forensics

Authors:
Atul Singh;Petros Maniatis;Timothy Roscoe;Peter Druschel
Affiliations:
Rice University;Intel Research Berkeley;Intel Research Berkeley;Rice University
Venue:
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Year:
2006

Citing 24
Cited 20

A relational approach to monitoring complex systems

ACM Transactions on Computer Systems (TOCS)
Managing Communication Networks by Monitoring Databases

IEEE Transactions on Software Engineering
Dynamic control of performance monitoring on large scale parallel systems

ICS '93 Proceedings of the 7th international conference on Supercomputing
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Semantic issues in the design of languages for debugging

Computer Languages
The click modular router

ACM Transactions on Computer Systems (TOCS)
Understanding BGP misconfiguration

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
A Relational Model for Distributed Systems Monitoring using Flexible Agents

SDNE '96 Proceedings of the 3rd Workshop on Services in Distributed and Networked Environments (SDNE '96)
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Backtracking intrusions

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Using Hy+ for network management and distributed debugging

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
Building Self-Configuring Services Using Service-Specific Knowledge

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Implementing declarative overlays

Proceedings of the twentieth ACM symposium on Operating systems principles
Dependable software needs pervasive debugging

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
WiDS: an integrated toolkit for distributed system development

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Causeway: operating system support for controlling and analyzing the execution of distributed programs

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Configuration debugging as search: finding the needle in the haystack

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic misconfiguration troubleshooting with peerpressure

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Replay debugging for distributed applications

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
A root cause localization model for large scale systems

HotDep'05 Proceedings of the First conference on Hot topics in system dependability

The design and implementation of a declarative sensor network system

Proceedings of the 5th international conference on Embedded networked sensor systems
Distributed Watchpoints: Debugging Large Modular Robot Systems

International Journal of Robotics Research
D3S: debugging deployed distributed systems

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Evita raced: metacompilation for declarative networks

Proceedings of the VLDB Endowment
Self-correlating predictive information tracking for large-scale production systems

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Cardinality Abstraction for Declarative Networking Applications

CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
Declarative networking

Communications of the ACM - Scratch Programming for All
On-the-fly progress detection in iterative stream queries

Proceedings of the VLDB Endowment
Predicting and preventing inconsistencies in deployed distributed systems

ACM Transactions on Computer Systems (TOCS)
Boom analytics: exploring data-centric, declarative programming for the cloud

Proceedings of the 5th European conference on Computer systems
Efficient querying and maintenance of network provenance at internet-scale

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A query language for understanding component interactions in production systems

Proceedings of the 24th ACM International Conference on Supercomputing
Measurement and diagnosis of address misconfigured P2P traffic

INFOCOM'10 Proceedings of the 29th conference on Information communications
Towards automatically checking thousands of failures with micro-specifications

HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
FATE and DESTINI: a framework for cloud recovery testing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
WiDS checker: combating bugs in distributed systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Secure network provenance

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Detecting problematic message sequences and frequencies in distributed systems

Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.