Impact of packet sampling on anomaly detection metrics

  • Authors:
  • Daniela Brauckhoff;Bernhard Tellenbach;Arno Wagner;Martin May;Anukool Lakhina

  • Affiliations:
  • Swiss Federal Institute of Technology (ETH), Zurich, Switzerland;Swiss Federal Institute of Technology (ETH), Zurich, Switzerland;Swiss Federal Institute of Technology (ETH), Zurich, Switzerland;Swiss Federal Institute of Technology (ETH), Zurich, Switzerland;Boston University, Boston, MA

  • Venue:
  • Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Packet sampling methods such as Cisco's NetFlow are widely employed by large networks to reduce the amount of traffic data measured. A key problem with packet sampling is that it is inherently a lossy process, discarding (potentially useful) information. In this paper, we empirically evaluate the impact of sampling on anomaly detection metrics. Starting with unsampled flow records collected during the Blaster worm outbreak, we reconstruct the underlying packet trace and simulate packet sampling at increasing rates. We then use our knowledge of the Blaster anomaly to build a baseline of normal traffic (without Blaster), against which we can measure the anomaly size at various sampling rates. This approach allows us to evaluate the impact of packet sampling on anomaly detection without being restricted to (or biased by) a particular anomaly detection method.We find that packet sampling does not disturb the anomaly size when measured in volume metrics such as the number of bytes and number of packets, but grossly biases the number of flows. However, we find that recently proposed entropy-based summarizations of packet and flow counts are affected less by sampling, and expose the Blaster worm outbreak even at higher sampling rates. Our findings suggest that entropy summarizations are more resilient to sampling than volume metrics. Thus, while not perfect, sampling still preserves sufficient distributional structure, which when harnessed by tools like entropy, can expose hard-to-detect scanning anomalies.