Readings in model-based diagnosis
Readings in model-based diagnosis
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Similarity-based queries for time series data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A tutorial on learning with Bayesian networks
Learning in graphical models
Causality: models, reasoning, and inference
Causality: models, reasoning, and inference
Independent component analysis: algorithms and applications
Neural Networks
Wavelet synopses with error guarantees
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Understanding BGP misconfiguration
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network
Proceedings of the IFIP TC6/WG6.6 Third International Symposium on Integrated Network Management with participation of the IEEE Communications Society CNOM and with support from the Institute for Educational Services
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Compressing historical information in sensor networks
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support
LISA '03 Proceedings of the 17th USENIX conference on System administration
Failure Diagnosis Using Decision Trees
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
IP fault localization via risk modeling
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Configuration debugging as search: finding the needle in the haystack
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic misconfiguration troubleshooting with peerpressure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Flight data recorder: monitoring persistent-state interactions to improve systems management
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
AutoBash: improving configuration management with operating system causality analysis
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Automated Rule-Based Diagnosis through a Distributed Monitor System
IEEE Transactions on Dependable and Secure Computing
NetPrints: diagnosing home network misconfigurations using shared knowledge
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
High speed and robust event correlation
IEEE Communications Magazine
SherLog: error diagnosis by connecting clues from run-time logs
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Automated debugging of SLO violations in enterprise systems
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
Webprofiler: cooperative diagnosis of web failures
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
Detecting the performance impact of upgrades in large operational networks
Proceedings of the ACM SIGCOMM 2010 conference
Crowdsourcing service-level network event monitoring
Proceedings of the ACM SIGCOMM 2010 conference
Proceedings of the 2010 ACM SIGCOMM workshop on Home networks
WebProphet: automating performance prediction for web services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Experiences with tracing causality in networked services
INM/WREN'10 Proceedings of the 2010 internet network management conference on Research on enterprise networking
Relational network-service clustering analysis with set evidences
Proceedings of the 3rd ACM workshop on Artificial intelligence and security
SecureAngle: improving wireless security using angle-of-arrival information
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Listen to me if you can: tracking user experience of mobile network on social media
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
What happened in my network: mining network events from router syslogs
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
Proceedings of the 6th International COnference
Proceedings of the 6th International COnference
ACM SIGCOMM Computer Communication Review
Advancing the state of home networking
Communications of the ACM
QoSaaS: quality of service as a service
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
Profiling network performance for multi-tier data center applications
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Understanding network failures in data centers: measurement, analysis, and implications
Proceedings of the ACM SIGCOMM 2011 conference
Large-scale app-based reporting of customer problems in cellular networks: potential and limitations
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
Performance of networked applications: the challenges in capturing the user's perception
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
dFault: fault localization in large-scale peer-to-peer systems
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Practical experiences with chronics discovery in large telecommunications systems
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Q-score: proactive service quality assessment in a large IPTV system
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Rapid detection of maintenance induced changes in service performance
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
Deja vu: fingerprinting network problems
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
Practical experiences with chronics discovery in large telecommunications systems
ACM SIGOPS Operating Systems Review
End-user perspectives of Internet connectivity problems
Computer Networks: The International Journal of Computer and Telecommunications Networking
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
NetPilot: automating datacenter network failure mitigation
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Automated diagnosis without predictability is a recipe for failure
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
NetPilot: automating datacenter network failure mitigation
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
An approach for failure recognition in IP-based industrial control networks and systems
International Journal of Network Management
Automated home network troubleshooting with device collaboration
Proceedings of the 2012 ACM conference on CoNEXT student workshop
Theia: visual signatures for problem diagnosis in large hadoop clusters
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
A framework to compute statistics of system parameters from very large trace files
ACM SIGOPS Operating Systems Review
A provider-side view of web search response time
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Demystifying the dark side of the middle: a field study of middlebox failures in datacenters
Proceedings of the 2013 conference on Internet measurement conference
When the network crumbles: an empirical study of cloud network failures and their impact on services
Proceedings of the 4th annual Symposium on Cloud Computing
An untold story of redundant clouds: making your service deployment truly reliable
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
NetCheck: network diagnoses from blackbox traces
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.02 |
By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. That is, the diagnostic system should be able to diagnose not only generic faults (e.g., performance-related) but also application specific faults (e.g., error codes). It should also identify culprits at a fine granularity such as a process or firewall configuration. We build a system, called NetMedic, that enables detailed diagnosis by harnessing the rich information exposed by modern operating systems and applications. It formulates detailed diagnosis as an inference problem that more faithfully captures the behaviors and interactions of fine-grained network components such as processes. The primary challenge in solving this problem is inferring when a component might be impacting another. Our solution is based on an intuitive technique that uses the joint behavior of two components in the past to estimate the likelihood of them impacting one another in the present. We find that our deployed prototype is effective at diagnosing faults that we inject in a live environment. The faulty component is correctly identified as the most likely culprit in 80% of the cases and is almost always in the list of top five culprits.