A near-optimal algorithm for computing the entropy of a stream

  • Authors:
  • Amit Chakrabarti;Graham Cormode;Andrew McGregor

  • Affiliations:
  • Dartmouth College;Lucent Bell Laboratories;ONR

  • Venue:
  • SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε-2 log(δ-1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε-2 / log(ε-1)), meaning that our algorithm is near-optimal in terms of its dependency on ε. This improves over previous work on this problem [8, 13, 17, 5]. We show that generalizing to kth order entropy requires close to linear space for all k ≥ 1, and give additive approximations using our algorithm. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph.