System identification: theory for the user
System identification: theory for the user
Elements of information theory
Elements of information theory
On the learnability of discrete distributions
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
A Spectral Algorithm for Learning Mixtures of Distributions
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
On the Learnability of Hidden Markov Models
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Observable Operator Models for Discrete Stochastic Time Series
Neural Computation
A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians
The Journal of Machine Learning Research
Isotropic PCA and Affine-Invariant Clustering
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
The value of observation for monitoring dynamic systems
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Realizations by stochastic finite automata
Journal of Computer and System Sciences
A lower bound for learning distributions generated by probabilistic automata
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Playlist prediction via metric embedding
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning mixtures of spherical gaussians: moment methods and spectral decompositions
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Learning probabilistic automata: A study in state distinguishability
Theoretical Computer Science
Joint training of non-negative Tucker decomposition and discrete density hidden Markov models
Computer Speech and Language
Hi-index | 0.00 |
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard (under cryptographic assumptions), and practitioners typically resort to search heuristics which suffer from the usual local optima issues. We prove that under a natural separation condition (bounds on the smallest singular value of the HMM parameters), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations-it implicitly depends on this quantity through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications.