A relational approach to monitoring complex systems
ACM Transactions on Computer Systems (TOCS)
Managing Communication Networks by Monitoring Databases
IEEE Transactions on Software Engineering
Dynamic control of performance monitoring on large scale parallel systems
ICS '93 Proceedings of the 7th international conference on Supercomputing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Semantic issues in the design of languages for debugging
Computer Languages
ACM Transactions on Computer Systems (TOCS)
Understanding BGP misconfiguration
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Chord: a scalable peer-to-peer lookup protocol for internet applications
IEEE/ACM Transactions on Networking (TON)
A Relational Model for Distributed Systems Monitoring using Flexible Agents
SDNE '96 Proceedings of the 3rd Workshop on Services in Distributed and Networked Environments (SDNE '96)
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Using Hy+ for network management and distributed debugging
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
Building Self-Configuring Services Using Service-Specific Knowledge
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Implementing declarative overlays
Proceedings of the twentieth ACM symposium on Operating systems principles
Dependable software needs pervasive debugging
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
WiDS: an integrated toolkit for distributed system development
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Configuration debugging as search: finding the needle in the haystack
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic misconfiguration troubleshooting with peerpressure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Replay debugging for distributed applications
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
A root cause localization model for large scale systems
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
The design and implementation of a declarative sensor network system
Proceedings of the 5th international conference on Embedded networked sensor systems
Distributed Watchpoints: Debugging Large Modular Robot Systems
International Journal of Robotics Research
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Evita raced: metacompilation for declarative networks
Proceedings of the VLDB Endowment
Self-correlating predictive information tracking for large-scale production systems
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Cardinality Abstraction for Declarative Networking Applications
CAV '09 Proceedings of the 21st International Conference on Computer Aided Verification
Communications of the ACM - Scratch Programming for All
On-the-fly progress detection in iterative stream queries
Proceedings of the VLDB Endowment
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
Efficient querying and maintenance of network provenance at internet-scale
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
Measurement and diagnosis of address misconfigured P2P traffic
INFOCOM'10 Proceedings of the 29th conference on Information communications
Towards automatically checking thousands of failures with micro-specifications
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
FATE and DESTINI: a framework for cloud recovery testing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Detecting problematic message sequences and frequencies in distributed systems
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.00 |
Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.