On spectral learning of mixtures of distributions

  • Authors:
  • Dimitris Achlioptas;Frank McSherry

  • Affiliations:
  • Microsoft Research, One Microsoft Way, Redmond, WA;Microsoft Research, Mountain View, CA

  • Venue:
  • COLT'05 Proceedings of the 18th annual conference on Learning Theory
  • Year:
  • 2005

Quantified Score

Hi-index 0.03

Visualization

Abstract

We consider the problem of learning mixtures of distributions via spectral methods and derive a characterization of when such methods are useful. Specifically, given a mixture-sample, let $\bar\mu_{i}, {\bar C_{i}}, \bar w_{i}$ denote the empirical mean, covariance matrix, and mixing weight of the samples from the i-th component. We prove that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample provided that each pair of means $\bar\mu_{i},\bar\mu_{j}$ is well separated, in the sense that $\|\bar\mu_{i} - \bar\mu_{j}\| We consider the problem of learning mixtures of distributions via spectral methods and derive a characterization of when such methods are useful. Specifically, given a mixture-sample, let $\bar\mu_{i}, {\bar C_{i}}, \bar w_{i}$ denote the empirical mean, covariance matrix, and mixing weight of the samples from the i-th component. We prove that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample provided that each pair of means $\bar\mu_{i},\bar\mu_{j}$ is well separated, in the sense that $\|\bar\mu_{i} - \bar\mu_{j}\|^{2}$ is at least $\|{\bar C_{i}\|_{2}(1/\bar w_{i}+1/\bar w_{j})}$ plus a term that depends on the concentration properties of the distributions in the mixture. This second term is very small for many distributions, including Gaussians, Log-concave, and many others. As a result, we get the best known bounds for learning mixtures of arbitrary Gaussians in terms of the required mean separation. At the same time, we prove that there are many Gaussian mixtures {(μi ,Ci ,wi)} such that each pair of means is separated by ||Ci||2(1/wi+1/wj), yet upon spectral projection the mixture collapses completely, i.e., all means and covariance matrices in the projected mixture are identical. ${\bar C_{i}\|_{2}(1/\bar w_{i}+1/\bar w_{j})}$ plus a term that depends on the concentration properties of the distributions in the mixture. This second term is very small for many distributions, including Gaussians, Log-concave, and many others. As a result, we get the best known bounds for learning mixtures of arbitrary Gaussians in terms of the required mean separation. At the same time, we prove that there are many Gaussian mixtures {(μi ,Ci ,wi)} such that each pair of means is separated by ||Ci||2(1/wi+1/wj), yet upon spectral projection the mixture collapses completely, i.e., all means and covariance matrices in the projected mixture are identical.