The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Word sense disambiguation vs. statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved statistical machine translation using paraphrases
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity
ICSC '07 Proceedings of the International Conference on Semantic Computing
Enriching spoken language translation with dialog acts
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A fully unsupervised word sense disambiguation method using dependency knowledge
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Design of the moses decoder for statistical machine translation
SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Graph connectivity measures for unsupervised word sense disambiguation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Joshua: an open source toolkit for parsing-based machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Hi-index | 0.00 |
In this paper we investigate the challenges of applying statistical machine translation to meeting conversations, with a particular view towards analyzing the importance of modeling contextual factors such as the larger discourse context and topic/domain information on translation performance. We describe the collection of a small corpus of parallel meeting data, the development of a statistical machine translation system in the absence of genre-matched training data, and we present a quantitative analysis of translation errors resulting from the lack of contextual modeling inherent in standard statistical machine translation systems. Finally, we demonstrate how the largest source of translation errors (lack of topic/domain knowledge) can be addressed by applying document-level, unsupervised word sense disambiguation, resulting in performance improvements over the baseline system.