Elements of information theory
Elements of information theory
On the learnability of discrete distributions
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Testing and spot-checking of data streams (extended abstract)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Sampling algorithms: lower bounds and applications
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
The complexity of approximating entropy
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
An Information Statistics Approach to Data Stream and Communication Complexity
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Sampling lower bounds via information theory
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Testing that distributions are close
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Algorithms for dynamic geometric problems over data streams
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
The complexity of massive data set computations
The complexity of massive data set computations
Clustering with Bregman Divergences
The Journal of Machine Learning Research
Some inequalities for information divergence and related measures of discrimination
IEEE Transactions on Information Theory
Data streaming algorithms for estimating entropy of network traffic
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Approximate quantiles and the order of the stream
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Estimating entropy over data streams
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A near-optimal algorithm for computing the entropy of a stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Testing symmetric properties of distributions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Robust lower bounds for communication and stream computation
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Estimating PageRank on graph streams
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching information divergences
Machine Learning
Multiple Pass Streaming Algorithms for Learning Mixtures of Distributions in ${\mathbb R}^d$
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Multiple pass streaming algorithms for learning mixtures of distributions in Rd
Theoretical Computer Science
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
Sketching information divergences
COLT'07 Proceedings of the 20th annual conference on Learning theory
A near-optimal algorithm for estimating the entropy of a stream
ACM Transactions on Algorithms (TALG)
Proceedings of the forty-second ACM symposium on Theory of computing
Semi-supervised ranking for document retrieval
Computer Speech and Language
Estimating PageRank on graph streams
Journal of the ACM (JACM)
Proceedings of the forty-third annual ACM symposium on Theory of computing
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Evaluation of summarization schemes for learning in streams
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Estimating entropy and entropy norm on data streams
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Deterministic splitter finding in a stream with constant storage and guarantees
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Space-efficient sampling from social activity streams
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Lower bounds for quantile estimation in random-order and multi-pass streaming
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Efficient Nearest-Neighbor Search in the Probability Simplex
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Hi-index | 0.00 |
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard lp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances.Batu et al posed the problem of property testing with respect to the Jensen-Shannon distance. We present optimal algorithms for estimating bounded, symmetric f-divergences (including the Jensen-Shannon divergence and the Hellinger distance) between distributions in various property testing frameworks. Along the way, we close a (log n)/H gap between the upper and lower bounds for estimating entropy H, yielding an optimal algorithm over all values of the entropy. In a data stream setting (sublinear space), we give the first algorithm for estimating the entropy of a distribution. Our algorithm runs in polylogarithmic space and yields an asymptotic constant factor approximation scheme. An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff curve.Our results have interesting structural implications that connect sublinear time and space constrained algorithms. The mediating model is the random order streaming model, which assumes the input is a random permutation of a multiset and was first considered by Munro and Paterson in 1980. We show that any property testing algorithm in the combined oracle model for calculating a permutation invariant functions can be simulated in the random order model in a single pass. This addresses a question raised by Feigenbaum et al regarding the relationship between property testing and stream algorithms. Further, we give a polylog-space PTAS for estimating the entropy of a one pass random order stream. This bound cannot be achieved in the combined oracle (generalized property testing) model.