A randomized algorithm for finding frequent elements in streams using o(loglogn) space

  • Authors:
  • Masatora Ogata;Yukiko Yamauchi;Shuji Kijima;Masafumi Yamashita

  • Affiliations:
  • Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan

  • Venue:
  • ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding frequent items in a data stream is a fundamental problem; Given a threshold θ∈(0,1), find items appearing more than $\theta \cdotp N$ times in an input stream with length N. Karp, Shenker, Papadimiriou (2003) gave a simple deterministic online algorithm, which allows false positive outputs using memory of O(θ−1 logN) bits, while they also gave a lower bound. Motivated by the theoretical bound of the space complexity, this paper proposes a simple randomized online algorithm using memory of O(θ−2 log2θ−1+loglogN) bits where parameters for approximation are hidden in the constant. Our algorithm is robust for memory overflow, compared with other naïve randomized algorithms, or deterministic algorithms using memory of O(logN) bits. We also give some randomized algorithms for approximate counting.