The Power of Linear Estimators

  • Authors:
  • Gregory Valiant;Paul Valiant

  • Affiliations:
  • -;-

  • Venue:
  • FOCS '11 Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics -- the number of domain elements seen once, twice, etc. in the sample -- and output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators \emph{linear}. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly sub optimal convergence, compared to the bounds shown in [VV11]. Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a near-optimal linear estimator. Additionally, we give a practical and polynomial-time algorithm for constructing such estimators for any given parameters. While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result to give explicit constructions of near-optimal linear estimators for three properties: entropy, $L_1$ distance to uniformity, and for pairs of distributions, $L_1$ distance. Our entropy estimator, when given $O(\frac{n}{\eps \log n})$ independent samples from a distribution of support at most $n,$ will estimate the entropy of the distribution to within additive accuracy $\epsilon$, with probability of failure $o(1/poly(n)).$ From the recent lower bounds given in [VV11], this estimator is optimal, to constant factor, both in its dependence on $n$, and its dependence on $\epsilon.$ In particular, the inverse-linear convergence rate of this estimator resolves the main open question of [VV11], which left open the possibility that the error decreased only with the square root of the number of samples. Our distance to uniformity estimator, when given $O(\frac{m}{\eps^2\log m})$ independent samples from any distribution, returns an $\eps$-accurate estimate of the $L_1$ distance to the uniform distribution of support $m$. This is constant-factor optimal, for constant $\epsilon$. Finally, our framework extends naturally to properties of pairs of distributions, including estimating the $L_1$ distance and KL-divergence between pairs of distributions. We give an explicit linear estimator for estimating $L_1$ distance to additive accuracy $\epsilon$ using $O(\frac{n}{\eps^2\log n})$ samples from each distribution, which is constant-factor optimal, for constant $\epsilon$. This is the first sub linear-sample estimator for this fundamental property.