Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Property testing and its connection to learning and approximation
Journal of the ACM (JACM)
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Robust Characterizations of Polynomials withApplications to Program Testing
SIAM Journal on Computing
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Testing that distributions are close
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Testing Random Variables for Independence and Identity
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Sublinear algorithms for testing monotone and unimodal distributions
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
The Complexity of Approximating the Entropy
SIAM Journal on Computing
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Testing k-wise and almost k-wise independence
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Testing symmetric properties of distributions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Property Testing: A Learning Theory Perspective
Foundations and Trends® in Machine Learning
Strong Lower Bounds for Approximating Distribution Support Size and the Distinct Elements Problem
SIAM Journal on Computing
Proceedings of the forty-third annual ACM symposium on Theory of computing
A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data
IEEE Transactions on Information Theory
Hi-index | 0.00 |
A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the l 2 distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the l 1 distance and l 2 distance respectively.