Document Image Decoding by Heuristic Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Beam sampling for the infinite hidden Markov model
Proceedings of the 25th international conference on Machine learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
On dual decomposition and linear programming relaxations for natural language processing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Exact decoding of phrase-based translation models through Lagrangian relaxation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We present a method for exact optimization and sampling from high order Hidden Markov Models (HMMs), which are generally handled by approximation techniques. Motivated by adaptive rejection sampling and heuristic search, we propose a strategy based on sequentially refining a lower-order language model that is an upper bound on the true model we wish to decode and sample from. This allows us to build tractable variable-order HMMs. The ARPA format for language models is extended to enable an efficient use of the max-backoff quantities required to compute the upper bound. We evaluate our approach on two problems: a SMS-retrieval task and a POS tagging experiment using 5-gram models. Results show that the same approach can be used for exact optimization and sampling, while explicitly constructing only a fraction of the total implicit state-space.