End-to-end routing behavior in the Internet
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Understanding BGP misconfiguration
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Measuring the effects of internet path faults on reactive routing
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A measurement framework for pin-pointing routing changes
Proceedings of the ACM SIGCOMM workshop on Network troubleshooting: research, theory and operations practice meet malfunctioning reality
PlanetSeer: internet path failure monitoring and characterization in wide-area services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Diagnosing network disruptions with network-wide analysis
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
iPlane: an information plane for distributed services
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Testing the reachability of (new) address space
Proceedings of the 2007 SIGCOMM workshop on Internet network management
NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data
CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
Studying black holes in the internet with Hubble
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Census and survey of the visible internet
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Internet optometry: assessing the broken glasses in internet reachability
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Measurement methods for fast and accurate blackhole identification with binary tomography
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
California fault lines: understanding the causes and impact of network failures
Proceedings of the ACM SIGCOMM 2010 conference
Crowdsourcing service-level network event monitoring
Proceedings of the ACM SIGCOMM 2010 conference
Internet background radiation revisited
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Selecting representative IP addresses for internet topology studies
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Analysis of country-wide internet outages caused by censorship
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
ACM SIGCOMM Computer Communication Review
LIFEGUARD: practical repair of persistent route failures
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Towards active measurements of edge network outages
PAM'13 Proceedings of the 14th international conference on Passive and Active Measurement
Hi-index | 0.00 |
Natural and human factors cause Internet outages---from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled: deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet "background radiation" by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.