G-RCA: a generic root cause analysis platform for service quality management in large IP networks

  • Authors:
  • He Yan;Lee Breslau;Zihui Ge;Dan Massey;Dan Pei;Jennifer Yates

  • Affiliations:
  • Colorado State University;AT&T Labs - Research;AT&T Labs - Research;Colorado State University;AT&T Labs - Research;AT&T Labs - Research

  • Venue:
  • Proceedings of the 6th International COnference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As IP networks have become the mainstay of an increasingly diverse set of applications ranging from Internet games and streaming videos, to e-commerce and online-banking, and even to mission-critical 911, best effort service is no longer acceptable. This requires a transformation in network management from detecting and replacing individual faulty network elements to managing the service quality as a whole. In this paper we describe the design and development of a Generic Root Cause Analysis platform (G-RCA) for service quality management (SQM) in large IP networks. G-RCA contains a comprehensive service dependency model that includes network topological and cross-layer relationships, protocol interactions, and control plane dependencies. G-RCA abstracts the RCA process into signature identification for symptom and diagnostic events, temporal and spatial event correlation, and reasoning and inference logic. G-RCA provides a flexible rule specification language that allows operators to quickly customize G-RCA into different RCA tools as new problems need to be investigated. G-RCA is also integrated with the data trending, manual data exploration, and statistical correlation mining capabilities. G-RCA has proven to be a highly effective SQM platform in several different applications and we present results regarding BGP flaps, PIM flaps in Multicast VPN service, and end-to-end throughput drop in CDN service.