Declaring independence via the sketching of sketches

  • Authors:
  • Piotr Indyk;Andrew McGregor

  • Affiliations:
  • -;-

  • Venue:
  • Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i,j) ∈ [n]2 whose frequencies define a joint distribution (X,Y). In the distributed model, each coordinate of the pair may appear separately in the stream. We present a range of algorithms for approximating to what extent X and Y are independent, i.e., how close the joint distribution is to the product of the marginals. We consider various measures of closeness including ℓ1, ℓ2, and the mutual information between X and Y. Our algorithms are based on "sketching sketches", i.e., composing small-space linear synopses of the distributions. Perhaps ironically, the biggest technical challenges that arise relate to ensuring that different components of our estimates are sufficiently independent.