Robust PCA and clustering in noisy mixtures
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Learning mixtures of arbitrary distributions over large discrete domains
Proceedings of the 5th conference on Innovations in theoretical computer science
Hi-index | 0.00 |
We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [Proceedings of the $26$th Annual Symposium on Theory of Computing (STOC), Montréal, QC, 1994, ACM, New York, pp. 273-282]. We give a $\operatorname{poly}(n/\epsilon)$-time algorithm for learning a mixture of $k$ arbitrary product distributions over the $n$-dimensional Boolean cube $\{0,1\}^n$ to accuracy $\epsilon$, for any constant $k$. Previous polynomial-time algorithms could achieve this only for $k = 2$ product distributions; our result answers an open question stated independently in [M. Cryan, Learning and Approximation Algorithms for Problems Motivated by Evolutionary Trees, Ph.D. thesis, University of Warwick, Warwick, UK, 1999] and [Y. Freund and Y. Mansour, Proceedings of the $12$th Annual Conference on Computational Learning Theory, 1999, pp. 183-192]. We further give evidence that no polynomial-time algorithm can succeed when $k$ is superconstant, by reduction from a difficult open problem in PAC (probably approximately correct) learning. Finally, we generalize our $\operatorname{poly}(n/\epsilon)$-time algorithm to learn any mixture of $k = O(1)$ product distributions over $\{0,1, \dots, b-1\}^n$, for any $b = O(1)$.