Rapid detection of maintenance induced changes in service performance

  • Authors:
  • Ajay Mahimkar;Zihui Ge;Jia Wang;Jennifer Yates;Yin Zhang;Joanne Emmons;Brian Huntley;Mark Stockert

  • Affiliations:
  • AT&T Labs -- Research;AT&T Labs -- Research;AT&T Labs -- Research;AT&T Labs -- Research;The University of Texas at Austin;AT&T Labs -- Research;AT&T, Inc.;AT&T Labs -- Research

  • Venue:
  • Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Service quality in operational IP networks can be impacted due to planned or unplanned maintenance. During any maintenance activity, the responsibility of the operations team is to complete the work order and perform a check-up to ensure there are no unexpected service disruptions. Once the maintenance is complete, it is crucial to continuously monitor the network and look for any performance impacts. What operations lack today are effective tools to rapidly detect maintenance induced performance changes. The large scale and heterogeneity of network elements and performance metrics makes the problem extremely challenging. In this paper, we present PRISM, a new tool for detecting maintenance induced performance changes in a timely fashion. PRISM uses association between maintenance and the network elements to identify performance metrics for time-series analysis. It uses a new Multiscale Robust Local Subspace algorithm (MRLS) to accurately identify changes in performance even when the baseline is contaminated. We systematically evaluate PRISM using data collected at four large operational networks: a tier-1 backbone, VoIP, IPTV and 3G cellular and show that it achieves good accuracy. We also demonstrate the effectiveness of PRISM in real operational environments through interesting case study findings.