WAP5: black-box performance debugging for wide-area systems
Proceedings of the 15th international conference on World Wide Web
A: an assertion language for distributed systems
Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
Emergent (mis)behavior vs. complex software systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Using queries for distributed monitoring and forensics
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Mace: language support for building distributed systems
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Towards fingerpointing in the Emulab dynamic distributed system
WORLDS'06 Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems - Volume 3
Categorizing and differencing system behaviours
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Observer: keeping system models from becoming obsolete
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Flight data recorder: monitoring persistent-state interactions to improve systems management
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
BorderPatrol: isolating events for black-box tracing
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Fingerpointing correlated failures in replicated systems
SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
What's going on?: learning communication rules in edge networks
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Diagnosing distributed systems with self-propelled instrumentation
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Declarative Network Verification
PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
Improving the responsiveness of internet services with automatic cache placement
Proceedings of the 4th ACM European conference on Computer systems
Configuration-space performance anomaly depiction
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Live Debugging of Distributed Systems
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Reference-driven performance anomaly identification
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Macroscope: end-point approach to networked application dependency discovery
Proceedings of the 5th international conference on Emerging networking experiments and technologies
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
SelfTalk for Dena: query language and runtime support for evaluating system behavior
ACM SIGOPS Operating Systems Review
Barricade: defending systems against operator mistakes
Proceedings of the 5th European conference on Computer systems
Towards versatile performance models for complex, popular applications
ACM SIGMETRICS Performance Evaluation Review
Analyzing blocking to debug performance problems on multi-core systems
ACM SIGOPS Operating Systems Review
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
Practical performance models for complex, popular applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A query language and runtime tool for evaluating behavior of multi-tier servers
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Black-box problem diagnosis in parallel file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Towards automatic inference of task hierarchies in complex systems
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Quanto: tracking energy in networked embedded systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
CLUEBOX: a performance log analyzer for automated troubleshooting
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Look who's talking: discovering dependencies between virtual machines using CPU utilization
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Experiences with tracing causality in networked services
INM/WREN'10 Proceedings of the 2010 internet network management conference on Research on enterprise networking
Scoped identifiers for efficient bit aligned logging
Proceedings of the Conference on Design, Automation and Test in Europe
Finding latent performance bugs in systems implementations
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Profiling network performance for multi-tier data center applications
Proceedings of the 8th USENIX conference on Networked systems design and implementation
FATE and DESTINI: a framework for cloud recovery testing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
G2: a graph processing system for diagnosing distributed systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
PAL: Propagation-aware Anomaly Localization for cloud hosted distributed applications
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Mining temporal invariants from partially ordered logs
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Mining temporal invariants from partially ordered logs
ACM SIGOPS Operating Systems Review
Modeling the parallel execution of black-box services
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Capturing performance assumptions using stochastic performance logic
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Net-cohort: detecting and managing VM ensembles in virtualized data centers
Proceedings of the 9th international conference on Autonomic computing
Detecting problematic message sequences and frequencies in distributed systems
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Be conservative: enhancing failure diagnosis with proactive logging
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
MDiag: Mobility-assisted diagnosis for wireless sensor networks
Journal of Network and Computer Applications
An online service-oriented performance profiling tool for cloud computing systems
Frontiers of Computer Science: Selected Publications from Chinese Universities
On fault resilience of OpenStack
Proceedings of the 4th annual Symposium on Cloud Computing
BPM'13 Proceedings of the 11th international conference on Business Process Management
DEFINED: deterministic execution for interactive control-plane debugging
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Making problem diagnosiswork for large-scale, production storage systems
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Challenges to error diagnosis in hadoop ecosystems
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
HARDFS: hardening HDFS with selective and lightweight versioning
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
NetCheck: network diagnoses from blackbox traces
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system's behavior and the programmer's assumptions about that behavior. We present Pip, an infrastructure for comparing actual behavior and expected behavior to expose structural errors and performance problems in distributed systems. Pip allows programmers to express, in a declarative language, expectations about the system's communications structure, timing, and resource consumption. Pip includes system instrumentation and annotation tools to log actual system behavior, and visualization and query tools for exploring expected and unexpected behavior. Pip allows a developer to quickly understand and debug both familiar and unfamiliar systems. We applied Pip to several applications, including FAB, SplitStream, Bullet, and RanSub. We generated most of the instrumentation for all four applications automatically. We found the needed expectations easy to write, starting in each case with automatically generated expectations. Pip found unexpected behavior in each application, and helped to isolate the causes of poor performance and incorrect behavior.