Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Shrink: a tool for failure diagnosis in IP networks
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Dynamic syslog mining for network failure monitoring
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Streaming pattern discovery in multiple time-series
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Minerals: using data mining to detect router misconfigurations
Proceedings of the 2006 SIGCOMM workshop on Mining network data
OSPF monitoring: architecture, design and deployment experience
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
IP fault localization via risk modeling
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Troubleshooting chronic conditions in large IP networks
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
KnowOps: towards an embedded knowledge base for network management and operations
Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
As IP networks have become the mainstay of an increasingly diverse set of applications ranging from Internet games and streaming videos, to e-commerce and online-banking, and even to mission-critical 911, best effort service is no longer acceptable. This requires a transformation in network management from detecting and replacing individual faulty network elements to managing the service quality as a whole. In this paper we describe the design and development of a Generic Root Cause Analysis platform (G-RCA) for service quality management (SQM) in large IP networks. G-RCA contains a comprehensive service dependency model that includes network topological and cross-layer relationships, protocol interactions, and control plane dependencies. G-RCA abstracts the RCA process into signature identification for symptom and diagnostic events, temporal and spatial event correlation, and reasoning and inference logic. G-RCA provides a flexible rule specification language that allows operators to quickly customize G-RCA into different RCA tools as new problems need to be investigated. G-RCA is also integrated with the data trending, manual data exploration, and statistical correlation mining capabilities. G-RCA has proven to be a highly effective SQM platform in several different applications and we present results regarding BGP flaps, PIM flaps in Multicast VPN service, and end-to-end throughput drop in CDN service.