Flow sampling under hard resource constraints

  • Authors:
  • Nick Duffield;Carsten Lund;Mikkel Thorup

  • Affiliations:
  • AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ

  • Venue:
  • Proceedings of the joint international conference on Measurement and modeling of computer systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many network management applications use as their data traffic volumes differentiated by attributes such as IP address or port number. IP flow records are commonly collected for this purpose: these enable determination of fine-grained usage of network resources. However, the increasingly large volumes of flow statistics incur concomitant costs in the resources of the measurement infrastructure. This motivates sampling of flow records.This paper addresses sampling strategy for flow records. Recent work has shown that non-uniform sampling is necessary in order to control estimation variance arising from the observed heavy-tailed distribution of flow lengths. However, while this approach controls estimator variance, it does not place hard limits on the number of flows sampled. Such limits are often required during arbitrary downstream sampling, resampling and aggregation operations employed in analysis of the data.This paper proposes a correlated sampling strategy that is able to select an arbitrarily small number of the "best" representatives of a set of flows. We show that usage estimates arising from such selection are unbiased, and show how to estimate their variance, both offline for modeling purposes, and online during the sampling itself. The selection algorithm can be implemented in a queue-like data structure in which memory usage is uniformly bounded during measurement. Finally, we compare the complexity and performance of our scheme with other potential approaches.