Space-efficient tracking of persistent items in a massive data stream

  • Authors:
  • Bibudh Lahiri;Jaideep Chandrashekar;Srikanta Tirthapura

  • Affiliations:
  • Iowa State University, Ames, IA, USA;Intel Labs Berkeley, Berkeley, CA, USA;Iowa State University, Ames, IA, USA

  • Venue:
  • Proceedings of the 5th ACM international conference on Distributed event-based system
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Motivated by scenarios in network anomaly detection, we consider the problem of detecting persistent items in a data stream, which are items that occur "regularly" in the stream. In contrast with heavy-hitters, persistent items do not necessarily contribute significantly to the volume of a stream, and may escape detection by traditional volume-based anomaly detectors. We first show that any online algorithm that tracks persistent items exactly must necessarily use a large workspace, and is infeasible to run on a traffic monitoring node. In light of this lower bound, we introduce an approximate formulation of the problem and present a small-space algorithm to approximately track persistent items over a large data stream. Our experiments on a real traffic dataset shows that in typical cases, the algorithm achieves a physical space compression of 5x-7x, while incurring very few false positives (