Elements of information theory
Elements of information theory
On the distributional complexity of disjointness
Theoretical Computer Science
Additive models, boosting, and inference for generalized divergences
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Boosting as entropy projection
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
External memory algorithms
Prediction games and arcing algorithms
Neural Computation
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space lower bounds for distance approximation in the data stream model
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
An elementary proof of a theorem of Johnson and Lindenstrauss
Random Structures & Algorithms
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Simpler algorithm for estimating frequency moments of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A near-optimal algorithm for computing the entropy of a stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Declaring independence via the sketching of sketches
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Robust lower bounds for communication and stream computation
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Fast nearest neighbor retrieval for bregman divergences
Proceedings of the 25th international conference on Machine learning
On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Hi-index | 0.00 |
When comparing discrete probability distributions, natural measures of similarity are not lp distances but rather are information-divergences such as Kullback-Leibler and Hellinger. This paper considers some of the issues related to constructing small-space sketches of distributions, a concept related to dimensionality-reduction, such that these measures can be approximately computed from the sketches. Related problems for lp distances are reasonably well understood via a series of results including Johnson, Lindenstrauss [27,18], Alon, Matias, Szegedy [1], Indyk [24], and Brinkman, Charikar [8]. In contrast, almost no analogous results are known to date about constructing sketches for the information-divergences used in statistics and learning theory.