A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data

Authors:
L. Paninski
Affiliations:
Dept. of Stat., Columbia Univ., New York, NY
Venue:
IEEE Transactions on Information Theory
Year:
2008

Citing 0
Cited 5

Entropy estimation for real-time encrypted traffic identification

TMA'11 Proceedings of the Third international conference on Traffic monitoring and analysis
Privacy-preserving outsourcing of brute-force key searches

Proceedings of the 3rd ACM workshop on Cloud computing security workshop
Approximating and testing k-histogram distributions in sub-linear time

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Taming big probability distributions

XRDS: Crossroads, The ACM Magazine for Students - Big Data
Testing Closeness of Discrete Distributions

Journal of the ACM (JACM)

Quantified Score

Hi-index	754.84

Visualization

Abstract

How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L1 sense, Sigmai=1 m |p(i) - 1/m| > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori.) Somewhat surprisingly, we only need N epsiv2 Gt m 1/2 to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.