A simple algorithm for finding frequent elements in streams and bags

  • Authors:
  • Richard M. Karp;Scott Shenker;Christos H. Papadimitriou

  • Affiliations:
  • International Computer Science Institute and University of California, Berkeley, California;International Computer Science Institute and University of California, Berkeley, California;University of California, Berkeley, California

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algorithm for finding a majority element, for identifying a set of at most 1/θ items that includes, possibly among others, all items with frequency greater than θ.