Communications of the ACM
On the learnability of discrete distributions
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Learning mixtures of arbitrary gaussians
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
A Two-Round Variant of EM for Gaussian Mixtures
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A spectral algorithm for learning mixture models
Journal of Computer and System Sciences - Special issue on FOCS 2002
The geometry of logconcave functions and sampling algorithms
Random Structures & Algorithms
The Spectral Method for General Mixture Models
SIAM Journal on Computing
Isotropic PCA and Affine-Invariant Clustering
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
PAC learning axis-aligned mixtures of gaussians with no separation assumption
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Analysis of perceptron-based active learning
COLT'05 Proceedings of the 18th annual conference on Learning Theory
On spectral learning of mixtures of distributions
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Modeling high-dimensional data: technical perspective
Communications of the ACM
Learning poisson binomial distributions
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Effective principal component analysis
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Learning mixtures of spherical gaussians: moment methods and spectral decompositions
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Learning mixtures of arbitrary distributions over large discrete domains
Proceedings of the 5th conference on Innovations in theoretical computer science
Hi-index | 0.02 |
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We provide a polynomial-time algorithm for this problem for the case of two Gaussians in $n$ dimensions (even if they overlap), with provably minimal assumptions on the Gaussians, and polynomial data requirements. In statistical terms, our estimator converges at an inverse polynomial rate, and no such estimator (even exponential time) was known for this problem (even in one dimension). Our algorithm reduces the n-dimensional problem to the one-dimensional problem, where the method of moments is applied. One technical challenge is proving that noisy estimates of the first six moments of a univariate mixture suffice to recover accurate estimates of the mixture parameters, as conjectured by Pearson (1894), and in fact these estimates converge at an inverse polynomial rate. As a corollary, we can efficiently perform near-optimal clustering: in the case where the overlap between the Gaussians is small, one can accurately cluster the data, and when the Gaussians have partial overlap, one can still accurately cluster those data points which are not in the overlap region. A second consequence is a polynomial-time density estimation algorithm for arbitrary mixtures of two Gaussians, generalizing previous work on axis-aligned Gaussians (Feldman {\em et al}, 2006).