Towards a practical alarm correlation system
Proceedings of the fourth international symposium on Integrated network management IV
Event correlation using rule and object based techniques
Proceedings of the fourth international symposium on Integrated network management IV
Optical Networks: A Practical Perspective
Optical Networks: A Practical Perspective
An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation
Journal of Network and Systems Management
An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network
Proceedings of the IFIP TC6/WG6.6 Third International Symposium on Integrated Network Management with participation of the IEEE Communications Society CNOM and with support from the Institute for Educational Services
A Probabilistic Approach to Fault Diagnosis in Linear Lightwave Networks
Proceedings of the IFIP TC6/WG6.6 Third International Symposium on Integrated Network Management with participation of the IEEE Communications Society CNOM and with support from the Institute for Educational Services
An information-theoretic approach to traffic matrix estimation
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Combining routing and traffic data for detection of IP forwarding anomalies
Proceedings of the joint international conference on Measurement and modeling of computer systems
Optical Fiber Telecommunications: Systems and Networks
Optical Fiber Telecommunications: Systems and Networks
High speed and robust event correlation
IEEE Communications Magazine
Issues for routing in the optical layer
IEEE Communications Magazine
Shrink: a tool for failure diagnosis in IP networks
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
CONMan: a step towards network manageability
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Networkmd: topology inference and failure diagnosis in the last mile
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data
CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
Inferring groups of correlated failures
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Failure diagnosis with incomplete information in cable networks
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Effective diagnosis of routing disruptions from end systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Studying black holes in the internet with Hubble
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Answering what-if deployment and configuration questions with wise
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Shadow configuration as a network management primitive
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Characterization of failures in an operational IP backbone network
IEEE/ACM Transactions on Networking (TON)
Passive diagnosis for wireless sensor networks
Proceedings of the 6th ACM conference on Embedded network sensor systems
Scalability of network-failure resilience: analysis using multi-layer probabilistic graphical models
IEEE/ACM Transactions on Networking (TON)
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
Troubleshooting chronic conditions in large IP networks
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
NetPrints: diagnosing home network misconfigurations using shared knowledge
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Change is hard: adapting dependency graph models for unified diagnosis in wired/wireless networks
Proceedings of the 1st ACM workshop on Research on enterprise networking
Characterizing VLAN-induced sharing in a campus network
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Probabilistic fault diagnosis for IT services in noisy and dynamic environments
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
Webprofiler: cooperative diagnosis of web failures
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
R3: resilient routing reconfiguration
Proceedings of the ACM SIGCOMM 2010 conference
Detecting the performance impact of upgrades in large operational networks
Proceedings of the ACM SIGCOMM 2010 conference
California fault lines: understanding the causes and impact of network failures
Proceedings of the ACM SIGCOMM 2010 conference
Automating network application dependency discovery: experiences, limitations, and new solutions
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
Proceedings of the 6th International COnference
Proceedings of the 6th International COnference
Passive diagnosis for wireless sensor networks
IEEE/ACM Transactions on Networking (TON)
Diverse routing in networks with probabilistic failures
IEEE/ACM Transactions on Networking (TON)
Analyzing IPTV set-top box crashes
Proceedings of the 2nd ACM SIGCOMM workshop on Home networks
dFault: fault localization in large-scale peer-to-peer systems
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Multicast with aggregated deliveries
Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing
Q-score: proactive service quality assessment in a large IPTV system
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Rapid detection of maintenance induced changes in service performance
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
Guarantees for decentralized event correlation
Proceedings of the 8th Middleware Doctoral Symposium
FAIDECS: fair decentralized event correlation
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
DAPA: diagnosing application performance anomalies for virtualized infrastructures
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
NetPilot: automating datacenter network failure mitigation
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
NetPilot: automating datacenter network failure mitigation
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Automatic test packet generation
Proceedings of the 8th international conference on Emerging networking experiments and technologies
FAIDECS: fair decentralized event correlation
Proceedings of the 12th International Middleware Conference
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
IEEE/ACM Transactions on Networking (TON)
Multicasting in the presence of aggregated deliveries
Journal of Parallel and Distributed Computing
Aggregation for implicit invocations
Proceedings of the 12th annual international conference on Aspect-oriented software development
Improving availability in distributed systems with failure informers
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Nail-it-down: nailing and fixing configuration faults in cloud environments
Proceedings of the ACM International Conference on Computing Frontiers
Answering: techniques and deployment experience
IEEE/ACM Transactions on Networking (TON)
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Automated, rapid, and effective fault management is a central goal of large operational IP networks. Today's networks suffer from a wide and volatile set of failure modes, where the underlying fault proves difficult to detect and localize, thereby delaying repair. One of the main challenges stems from operational reality: IP routing and the underlying optical fiber plant are typically described by disparate data models and housed in distinct network management systems. We introduce a fault-localization methodology based on the use of risk models and an associated troubleshooting system, SCORE (Spatial Correlation Engine), which automatically identifies likely root causes across layers. In particular, we apply SCORE to the problem of localizing link failures in IP and optical networks. In experiments conducted on a tier-1 ISP backbone, SCORE proved remarkably effective at localizing optical link failures using only IP-layer event logs. Moreover, SCORE was often able to automatically uncover inconsistencies in the databases that maintain the critical associations between the IP and optical networks.