Finding frequent items in data streams

  • Authors:
  • Moses Charikar;Kevin Chen;Martin Farach-Colton

  • Affiliations:
  • Department of Computer Science, Princeton University, Princeton, NJ;Computer Science Division, University of California, Berkeley, CA;Department of Computer Science, Rutgers University, Piscataway, NJ

  • Venue:
  • Theoretical Computer Science - Special issue on automata, languages and programming
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space. Our method relies on a data structure called a COUNT SKETCH, which allows us to reliably estimate the frequencies of frequent items in the stream. Our algorithm achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this latter problem has not been previously studied in the literature.