Elements of information theory
Elements of information theory
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Foundations of statistical natural language processing
Foundations of statistical natural language processing
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
Efficient Approximation of Correlated Sums on Data Streams
IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Testing Random Variables for Independence and Identity
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Optimal space lower bounds for all frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Incremental Clustering and Dynamic Information Retrieval
SIAM Journal on Computing
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Graph distances in the streaming model: the value of space
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Simpler algorithm for estimating frequency moments of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The space complexity of pass-efficient algorithms for clustering
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On graph problems in a semi-streaming model
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
Testing k-wise and almost k-wise independence
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
A near-optimal algorithm for computing the entropy of a stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Sketching information divergences
COLT'07 Proceedings of the 20th annual conference on Learning theory
Lower bounds for quantile estimation in random-order and multi-pass streaming
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Testing symmetric properties of distributions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Measuring independence of datasets
Proceedings of the forty-second ACM symposium on Theory of computing
Proceedings of the forty-second ACM symposium on Theory of computing
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Periodicity and cyclic shifts via linear sketches
APPROX'11/RANDOM'11 Proceedings of the 14th international workshop and 15th international conference on Approximation, randomization, and combinatorial optimization: algorithms and techniques
Compressed matrix multiplication
Proceedings of the 3rd Innovations in Theoretical Computer Science Conference
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Testing Symmetric Properties of Distributions
SIAM Journal on Computing
Testing Closeness of Discrete Distributions
Journal of the ACM (JACM)
Compressed matrix multiplication
ACM Transactions on Computation Theory (TOCT) - Special issue on innovations in theoretical computer science 2012
Hi-index | 0.00 |
We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i,j) ∈ [n]2 whose frequencies define a joint distribution (X,Y). In the distributed model, each coordinate of the pair may appear separately in the stream. We present a range of algorithms for approximating to what extent X and Y are independent, i.e., how close the joint distribution is to the product of the marginals. We consider various measures of closeness including ℓ1, ℓ2, and the mutual information between X and Y. Our algorithms are based on "sketching sketches", i.e., composing small-space linear synopses of the distributions. Perhaps ironically, the biggest technical challenges that arise relate to ensuring that different components of our estimates are sufficiently independent.