Reversible sketches for efficient and accurate change detection over network data streams

  • Authors:
  • Robert Schweller;Ashish Gupta;Elliot Parsons;Yan Chen

  • Affiliations:
  • Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL;Northwestern University, Evanston, IL

  • Venue:
  • Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traffic anomalies such as failures and attacks are increasing in frequency and severity, and thus identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows and looks for heavy changes in traffic patterns (e.g., volume, number of connections). However, as link speeds and the number of flows increase, keeping per-flow state is not scalable. The recently proposed sketch-based schemes [14] are among the very few that can detect heavy changes and anomalies over massive data streams at network traffic speeds. However, sketches do not preserve the key (e.g., source IP address) of the flows. Hence, even if anomalies are detected, it is difficult to infer the culprit flows, making it a big practical hurdle for online deployment. Meanwhile, the number of keys is too large to record. To address this challenge, we propose efficient reversible hashing algorithms to infer the keys of culprit flows from sketches without storing any explicit key information. No extra memory or memory accesses are needed for recording the streaming data. Meanwhile, the heavy change detection daemon runs in the background with space complexity and computational time sublinear to the key space size. This short paper describes the conceptual framework of the reversible sketches, as well as some initial approaches for implementation. See [23] for the optimized algorithms in details. comment We further apply various emph IP-mangling algorithms and emph bucket classification methods to reduce the false positives and false negatives. Evaluated with netflow traffic traces of a large edge router, we demonstrate that the reverse hashing can quickly infer the keys of culprit flows even for many changes with high accuracy.