The Complexity of Approximating the Entropy

  • Authors:
  • Tuugkan Batu;Sanjoy Dasgupta;Ravi Kumar;Ronitt Rubinfeld

  • Affiliations:
  • -;-;-;-

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy.In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a $\gamma$-multiplicative approximation to the entropy can be obtained in $O(n^{(1+\eta)/\gamma^2} \log n)$ time for distributions with entropy $\Omega(\gamma/\eta)$, where $n$ is the size of the domain of the distribution and $\eta$ is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of $\Omega(n^{1/(2\gamma^2)})$.We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for $\gamma$-multiplicative approximation to the entropy that runs in $O((\gamma^2 \log^2{n})/(h^2 (\gamma-1)^2))$ time for distributions with entropy $\Omega(h)$; for such distributions, we also show a lower bound of $\Omega((\log n)/(h(\gamma^2-1)+\gamma^2))$. Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.