Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Schemes for fault identification in communication networks
IEEE/ACM Transactions on Networking (TON)
BRITE: An Approach to Universal Topology Generation
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS
Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS
IP fault localization via risk modeling
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
High speed and robust event correlation
IEEE Communications Magazine
Measurement-based network monitoring and inference: scalability and missing information
IEEE Journal on Selected Areas in Communications
MOJO: a distributed physical layer anomaly detection system for 802.11 WLANs
Proceedings of the 4th international conference on Mobile systems, applications and services
Mining web logs to debug distant connectivity problems
Proceedings of the 2006 SIGCOMM workshop on Mining network data
Diagnosis of TCP overlay connection failures using bayesian networks
Proceedings of the 2006 SIGCOMM workshop on Mining network data
Diagnosing network disruptions with network-wide analysis
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Theoretical bounds on control-plane self-monitoring in routing protocols
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Networkmd: topology inference and failure diagnosis in the last mile
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data
CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
Failure diagnosis with incomplete information in cable networks
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Discovering correlated spatio-temporal changes in evolving graphs
Knowledge and Information Systems
Answering what-if deployment and configuration questions with wise
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Shadow configuration as a network management primitive
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Practical issues with using network tomography for fault diagnosis
ACM SIGCOMM Computer Communication Review
Characterization of failures in an operational IP backbone network
IEEE/ACM Transactions on Networking (TON)
Passive diagnosis for wireless sensor networks
Proceedings of the 6th ACM conference on Embedded network sensor systems
Scalability of network-failure resilience: analysis using multi-layer probabilistic graphical models
IEEE/ACM Transactions on Networking (TON)
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
Troubleshooting chronic conditions in large IP networks
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Towards automated performance diagnosis in a large IPTV network
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Change is hard: adapting dependency graph models for unified diagnosis in wired/wireless networks
Proceedings of the 1st ACM workshop on Research on enterprise networking
Characterizing VLAN-induced sharing in a campus network
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Probabilistic fault diagnosis for IT services in noisy and dynamic environments
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
DRACA: decision support for root cause analysis and change impact analysis for CMDBs
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Scalable diagnosis in IP networks using path-based measurement and inference: A learning framework
Journal of Visual Communication and Image Representation
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
Fault diagnosis for high-level applications based on dynamic Bayesian network
APNOMS'09 Proceedings of the 12th Asia-Pacific network operations and management conference on Management enabling the future internet for changing business and new computing services
Efficient active probing for fault diagnosis in large scale and noisy networks
INFOCOM'10 Proceedings of the 29th conference on Information communications
Detecting the performance impact of upgrades in large operational networks
Proceedings of the ACM SIGCOMM 2010 conference
Automating network application dependency discovery: experiences, limitations, and new solutions
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
Proceedings of the 6th International COnference
Proceedings of the 6th International COnference
Passive diagnosis for wireless sensor networks
IEEE/ACM Transactions on Networking (TON)
Diverse routing in networks with probabilistic failures
IEEE/ACM Transactions on Networking (TON)
HotACI'06 Proceedings of the First international conference on Hot topics in autonomic computing
Analyzing IPTV set-top box crashes
Proceedings of the 2nd ACM SIGCOMM workshop on Home networks
dFault: fault localization in large-scale peer-to-peer systems
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Q-score: proactive service quality assessment in a large IPTV system
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Rapid detection of maintenance induced changes in service performance
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
A probe prediction approach to overlay network monitoring
Proceedings of the 7th International Conference on Network and Services Management
Large-Scale inference of network-service disruption upon natural disasters
Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
NetPilot: automating datacenter network failure mitigation
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Router support for fine-grained latency measurements
IEEE/ACM Transactions on Networking (TON)
NetPilot: automating datacenter network failure mitigation
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Proceedings of the 2012 ACM conference on Internet measurement conference
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
IEEE/ACM Transactions on Networking (TON)
Answering: techniques and deployment experience
IEEE/ACM Transactions on Networking (TON)
Failure detection in wireless sensor networks: A sequence-based dynamic approach
ACM Transactions on Sensor Networks (TOSN)
Hi-index | 0.00 |
Faults in an IP network have various causes such as the failure of one or more routers at the IP layer, fiber-cuts, failure of physical elements at the optical layer, or extraneous causes like power outages. These faults are usually detected as failures of a set of dependent logical entities--the IP links affected by the failed components. We present Shrink, a tool for root cause analysis of network faults which, given a set of failed IP links, identifies the underlying cause of the faulty state. Shrink models the diagnosis problem as a Bayesian network. It has two main contributions. First, it effectively accounts for noisy measurement and inaccurate mapping between the IP and optical layers. Second, it has an efficient inference algorithm that finds the most likely failure causes in polynomial time and with bounded errors. We compare Shrink with two prior approaches and show that it substantially improves the performance.