Algorithms and estimators for accurate summarization of internet traffic

  • Authors:
  • Edith Cohen;Nick Duffield;Haim Kaplan;Carsten Lund;Mikkel Thorup

  • Affiliations:
  • AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ;Tel Aviv University, Tel Aviv, Israel;AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ

  • Venue:
  • Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical summaries of traffic in IP networks are at the heart of network operation and are used to recover information on the traffic of arbitrary subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. Cisco's sampled NetFlow, based on aggregating a sampled packet stream into flows, is the most widely deployed such system. We observe two sources of inefficiency in current methods. Firstly, a single parameter (the sampling rate) is used to control utilization of both memory and processing/access speed, which means that it has to be set according to the bottleneck resource. Secondly, the unbiased estimators are applicable to summaries that in effect are collected through uneven use of resources during the measurement period (information from the earlier part of the measurement period is either not collected at all and fewer counter are utilized or discarded when performing a sampling rate adaptation). We develop algorithms that collect more informative summaries through an even and more efficient use of available resources. The heart of our approach is a novel derivation of unbiased estimators that use these more informative counts. We show how to efficiently compute these estimators and prove analytically that they are superior (have smaller variance on all packet streams and subpopulations) to previous approaches. Simulations on Pareto distributions and IP flow data show that the new summaries provide significantly more accurate estimates. We provide an implementation design that can be efficiently deployed at routers.