Sublinear Algorithms for Approximating String Compressibility
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Property Testing: A Learning Theory Perspective
Foundations and Trends® in Machine Learning
The average-case complexity of counting distinct elements
Proceedings of the 12th International Conference on Database Theory
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
A sublinear-time approximation scheme for bin packing
Theoretical Computer Science
Algorithmic and Analysis Techniques in Property Testing
Foundations and Trends® in Theoretical Computer Science
A near-optimal algorithm for estimating the entropy of a stream
ACM Transactions on Algorithms (TALG)
Invariance in property testing
Property testing
Invariance in property testing
Property testing
Proceedings of the forty-third annual ACM symposium on Theory of computing
Journal of Discrete Algorithms
Approximating and testing k-histogram distributions in sub-linear time
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Taming big probability distributions
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Testing Symmetric Properties of Distributions
SIAM Journal on Computing
On the power of conditional samples in distribution testing
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Testing Closeness of Discrete Distributions
Journal of the ACM (JACM)
Hi-index | 0.00 |
We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy.In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a $\gamma$-multiplicative approximation to the entropy can be obtained in $O(n^{(1+\eta)/\gamma^2} \log n)$ time for distributions with entropy $\Omega(\gamma/\eta)$, where $n$ is the size of the domain of the distribution and $\eta$ is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of $\Omega(n^{1/(2\gamma^2)})$.We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for $\gamma$-multiplicative approximation to the entropy that runs in $O((\gamma^2 \log^2{n})/(h^2 (\gamma-1)^2))$ time for distributions with entropy $\Omega(h)$; for such distributions, we also show a lower bound of $\Omega((\log n)/(h(\gamma^2-1)+\gamma^2))$. Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.