The Complexity of Approximating the Entropy

Authors:
Tuugkan Batu;Sanjoy Dasgupta;Ravi Kumar;Ronitt Rubinfeld
Affiliations:
-;-;-;-
Venue:
SIAM Journal on Computing
Year:
2005

Citing 0
Cited 16

Sublinear Algorithms for Approximating String Compressibility

APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Property Testing: A Learning Theory Perspective

Foundations and Trends® in Machine Learning
The average-case complexity of counting distinct elements

Proceedings of the 12th International Conference on Database Theory
Sublinear estimation of entropy and information distances

ACM Transactions on Algorithms (TALG)
A sublinear-time approximation scheme for bin packing

Theoretical Computer Science
Algorithmic and Analysis Techniques in Property Testing

Foundations and Trends® in Theoretical Computer Science
A near-optimal algorithm for estimating the entropy of a stream

ACM Transactions on Algorithms (TALG)
Invariance in property testing

Property testing
Invariance in property testing

Property testing
Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs

Proceedings of the forty-third annual ACM symposium on Theory of computing
Bounds from a card trick

Journal of Discrete Algorithms
Approximating and testing k-histogram distributions in sub-linear time

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Taming big probability distributions

XRDS: Crossroads, The ACM Magazine for Students - Big Data
Testing Symmetric Properties of Distributions

SIAM Journal on Computing
On the power of conditional samples in distribution testing

Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Testing Closeness of Discrete Distributions

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy.In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a $\gamma$-multiplicative approximation to the entropy can be obtained in $O(n^{(1+\eta)/\gamma^2} \log n)$ time for distributions with entropy $\Omega(\gamma/\eta)$, where $n$ is the size of the domain of the distribution and $\eta$ is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of $\Omega(n^{1/(2\gamma^2)})$.We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for $\gamma$-multiplicative approximation to the entropy that runs in $O((\gamma^2 \log^2{n})/(h^2 (\gamma-1)^2))$ time for distributions with entropy $\Omega(h)$; for such distributions, we also show a lower bound of $\Omega((\log n)/(h(\gamma^2-1)+\gamma^2))$. Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.