Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic acquisition of a large subcategorization dictionary from corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discriminative syntactic language modeling for speech recognition
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Distortion models for statistical machine translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Continuous space language models
Computer Speech and Language
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A unigram orientation model for statistical machine translation
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
11,001 new features for statistical machine translation
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Combining morphosyntactic enriched representation with n-best reranking in statistical translation
SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
CCG supertags in factored statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Towards better machine translation quality for the German--English language pairs
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Improved language modeling for statistical machine translation
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Effective use of linguistic and contextual information for statistical machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Cutting the long tail: hybrid language models for translation style adaptation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Encouraging consistent translation choices
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A class-based agreement model for generating accurately inflected translations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
CoSyne: synchronizing multilingual wiki content
Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
Hi-index | 0.00 |
Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.