Internet service performance failure detection
ACM SIGMETRICS Performance Evaluation Review
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Bayesian Fault Detection and Diagnosis in Dynamic Systems
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
User-level internet path diagnosis
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
PlanetLab: an overlay testbed for broad-coverage services
ACM SIGCOMM Computer Communication Review
Shrink: a tool for failure diagnosis in IP networks
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Reliability and security in the CoDeeN content distribution network
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Characterization and measurement of TCP traversal through NATs and firewalls
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
PlanetSeer: internet path failure monitoring and characterization in wide-area services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
NetProfiler: profiling wide-area networks using peer cooperation
IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Discovering correlated spatio-temporal changes in evolving graphs
Knowledge and Information Systems
Probabilistic fault diagnosis for IT services in noisy and dynamic environments
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Large-Scale inference of network-service disruption upon natural disasters
Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
Fine-grain diagnosis of overlay performance anomalies using end-point network experiences
Proceedings of the 8th International Conference on Network and Service Management
Hi-index | 0.00 |
When failures occur in Internet overlay connections today, it is difficult for users to determine the root cause of failure. An overlay connection may require TCP connections between a series of overlay nodes to succeed, but accurately determining which of these connections has failed is difficult for users without access to the internal workings of the overlay. Diagnosis using active probing is costly and may be inaccurate if probe packets are filtered or blocked. To address this problem, we develop a passive diagnosis approach that infers the most likely cause of failure using a Bayesian network modeling the conditional probability of TCP failures given the IP addresses of the hosts along the overlay path. We collect TCP failure data for 28.3 million TCP connections using data from the new Planetseer overlay monitoring system and train a Bayesian network for the diagnosis of overlay connection failures. We evaluate the accuracy of diagnosis using this Bayesian network on a set of overlay connections generated from observations of CoDeeN traffic patterns and find that our approach can accurately diagnose failures.