Multiple Pass Streaming Algorithms for Learning Mixtures of Distributions in ${\mathbb R}^d$

  • Authors:
  • Kevin L. Chang

  • Affiliations:
  • Max Planck Institute for Computer Science, Saarbrücken, Germany

  • Venue:
  • ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a multiple pass streaming algorithm for learning the density function of a mixture of kuniform distributions over rectangles (cells) in ${\mathbb R}^d$, for any d 0. Our learning model is: samples drawn according to the mixture are placed in arbitrary orderin a data stream that may only be accessed sequentially by an algorithm with a very limited random access memory space. Our algorithm makes 2茂戮驴 + 1 passes, for any 茂戮驴 0, and requires memory at most $\tilde O(\epsilon^{-2/\ell}k^2d^4+(2k)^d)$. This exhibits a strong memory-space tradeoff: a few more passes significantly lowers its memory requirements, thus trading one of the two most important resources in streaming computation for the other. Chang and Kannan ? first considered this problem for [1] d= 1, 2.Our learning algorithm is especially appropriate for situations where massive data sets of samples are available, but practical computation with such large inputs requires very restricted models of computation.