Performance comparison of routing protocols using MaRS: distance-vector versus link-state
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
End-to-end routing behavior in the Internet
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Maximum likelihood network topology identification from edge-based unicast measurements
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A case study of OSPF behavior in a large enterprise network
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
End-to-end WAN service availability
IEEE/ACM Transactions on Networking (TON)
Experimental Study of Internet Stability and Backbone Failures
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
A Performance Comparison of the Temporally-Ordered Routing Algorithm and Ideal Link-State Routing
ISCC '98 Proceedings of the Third IEEE Symposium on Computers & Communications
Experiences With Monitoring OSPF on a Regional Service Provider Network
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A comparison of overlay routing and multihoming route control
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A measurement study on the impact of routing events on end-to-end internet path performance
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Detecting BGP configuration faults with static analysis
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
IP fault localization via risk modeling
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
PlanetSeer: internet path failure monitoring and characterization in wide-area services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Improving the reliability of internet paths with one-hop source routing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Networkmd: topology inference and failure diagnosis in the last mile
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data
CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
A study of end-to-end web access failures
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Xl: an efficient network routing algorithm
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Practical issues with using network tomography for fault diagnosis
ACM SIGCOMM Computer Communication Review
Characterization of failures in an operational IP backbone network
IEEE/ACM Transactions on Networking (TON)
An investigation of the Internet's IP-layer connectivity
Computer Communications
Detecting large-scale system problems by mining console logs
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Network Tomography of Binary Network Performance Characteristics
IEEE Transactions on Information Theory
NetScope: traffic engineering for IP networks
IEEE Network: The Magazine of Global Internetworking
Predicting and tracking internet path changes
Proceedings of the ACM SIGCOMM 2011 conference
Demystifying configuration challenges and trade-offs in network-based ISP services
Proceedings of the ACM SIGCOMM 2011 conference
Understanding network failures in data centers: measurement, analysis, and implications
Proceedings of the ACM SIGCOMM 2011 conference
Automatic test packet generation
Proceedings of the 8th international conference on Emerging networking experiments and technologies
An empirical analysis of intra- and inter-datacenter network failures for geo-distributed services
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Juggling the Jigsaw: towards automated problem inference from network trouble tickets
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Trinocular: understanding internet reliability through adaptive probing
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Demystifying the dark side of the middle: a field study of middlebox failures in datacenters
Proceedings of the 2013 conference on Internet measurement conference
A comparison of syslog and IS-IS for network failure analysis
Proceedings of the 2013 conference on Internet measurement conference
When the network crumbles: an empirical study of cloud network failures and their impact on services
Proceedings of the 4th annual Symposium on Cloud Computing
A study of application-level recovery methods for transient network faults
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Hi-index | 0.01 |
Of the major factors affecting end-to-end service availability, network component failure is perhaps the least well understood. How often do failures occur, how long do they last, what are their causes, and how do they impact customers? Traditionally, answering questions such as these has required dedicated (and often expensive) instrumentation broadly deployed across a network. We propose an alternative approach: opportunistically mining "low-quality" data sources that are already available in modern network environments. We describe a methodology for recreating a succinct history of failure events in an IP network using a combination of structured data (router configurations and syslogs) and semi-structured data (email logs). Using this technique we analyze over five years of failure events in a large regional network consisting of over 200 routers; to our knowledge, this is the largest study of its kind.