Sketching information divergences

Authors:
Sudipto Guha;Piotr Indyk;Andrew Mcgregor
Affiliations:
Dept. of Computer and Information Sciences, University of Pennsylvania, Philadelphia, USA 19104;Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, USA 02139;Information Theory & Applications Center, University of California, San Diego, USA 92109
Venue:
Machine Learning
Year:
2008

Citing 23
Cited 3

Elements of information theory

Elements of information theory
On the distributional complexity of disjointness

Theoretical Computer Science
Additive models, boosting, and inference for generalized divergences

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Boosting as entropy projection

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Computing on data streams

External memory algorithms
Prediction games and arcing algorithms

Neural Computation
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space lower bounds for distance approximation in the data stream model

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
An Approximate L1-Difference Algorithm for Massive Data Streams

SIAM Journal on Computing
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Comparing Data Streams Using Hamming Norms (How to Zero In)

IEEE Transactions on Knowledge and Data Engineering
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Optimal space lower bounds for all frequency moments

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
An information statistics approach to data stream and communication complexity

Journal of Computer and System Sciences - Special issue on FOCS 2002
Optimal approximations of the frequency moments of data streams

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Simpler algorithm for estimating frequency moments of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Streaming and sublinear approximation of entropy and information distances

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A near-optimal algorithm for computing the entropy of a stream

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
On Estimating Frequency Moments of Data Streams

APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques

Zero-one frequency laws

Proceedings of the forty-second ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space

Proceedings of the forty-third annual ACM symposium on Theory of computing
Testing Closeness of Discrete Distributions

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.01

Visualization

Abstract

When comparing discrete probability distributions, natural measures of similarity are not 驴 p distances but rather are information divergences such as Kullback-Leibler and Hellinger. This paper considers some of the issues related to constructing small-space sketches of distributions in the data-stream model, a concept related to dimensionality reduction, such that these measures can be approximated from the sketches. Related problems for 驴 p distances are reasonably well understood via a series of results by Johnson and Lindenstrauss (Contemp. Math. 26:189---206, 1984), Alon et al. (J. Comput. Syst. Sci. 58(1):137---147, 1999), Indyk (IEEE Symposium on Foundations of Computer Science, pp. 202---208, 2000), and Brinkman and Charikar (IEEE Symposium on Foundations of Computer Science, pp. 514---523, 2003). In contrast, almost no analogous results are known to date about constructing sketches for the information divergences used in statistics and learning theory.Our main result is an impossibility result that shows that no small-space sketches exist for the multiplicative approximation of any commonly used f-divergences and Bregman divergences with the notable exceptions of 驴 1 and 驴 2 where small-space sketches exist. We then present data-stream algorithms for the additive approximation of a wide range of information divergences. Throughout, our emphasis is on providing general characterizations.