A maximum likelihood approach to continuous speech recognition
Readings in speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A systematic comparison of various statistical alignment models
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A statistical approach to language translation
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Improved word alignment using a symmetric lexicon model
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hi-index | 0.00 |
Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech processing. The model factors into two components: a language model to characterize the original message and a channel model to describe the channel's corruptive process. The standard approach for estimating the parameters of the channel model is unsupervised Maximum-Likelihood of the observation data, usually approximated using the Expectation-Maximization (EM) algorithm. In this paper we show that it is better to maximize the joint likelihood of the data at both ends of the noisy-channel. We derive a corresponding bi-directional EM algorithm and show that it gives better performance than standard EM on two tasks: (1) translation using a probabilistic lexicon and (2) adaptation of a part-of-speech tagger between related languages.