Improving statistical MT through morphological analysis

Authors:
Sharon Goldwater;David McClosky
Affiliations:
Brown University;Brown University
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 5
Cited 40

A systematic comparison of various statistical alignment models

Computational Linguistics
Improving SMT quality with morpho-syntactic analysis

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Czech-English dependency-based machine translation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Modelling lexical redundancy for machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Improving phrase-based statistical machine translation with morphosyntactic transformation

Machine Translation
Chinese word segmentation and statistical machine translation

ACM Transactions on Speech and Language Processing (TSLP)
Statistical machine translation

ACM Computing Surveys (CSUR)
Joining linguistic and statistical methods for Spanish-to-Basque speech translation

Speech Communication
On the impact of morphology in English to Spanish statistical MT

Speech Communication
Boosting statistical machine translation by lemmatization and linear interpolation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
The SAWA corpus: a parallel corpus English - Swahili

AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Context-based Arabic morphological analysis for machine translation

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Predicting success in machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Arabic preprocessing schemes for statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Evaluating an agglutinative segmentation model for ParaMor

SigMorPhon '08 Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology
Exploring different representational units in English-to-Turkish statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Word error rates: decomposition over Pos classes and applications for error analysis

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The 'noisier channel': translation from morphologically complex languages

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
English-to-Czech factored machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Morpho-syntactic information for automatic error analysis of statistical machine translation output

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Improving morphology induction by learning spelling rules

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation
Statistical machine translation into a morphologically complex language

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Informatica
Subword variation in text message classification

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation

IEEE Transactions on Audio, Speech, and Language Processing
LIMSI's statistical translation systems for WMT'10

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enhancing morphological alignment for translating highly inflected languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Nonparametric word segmentation for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Combining morpheme-based machine translation with post-processing morpheme prediction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Translating from morphologically complex languages: a paraphrase-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A syntactic transformation model for statistical machine translation

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Agreement constraints for statistical machine translation into German

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Orthographic and morphological processing for English---Arabic statistical machine translation

Machine Translation
Translation model of myanmar phrases for statistical machine translation

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
Machine translation without words through substring alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Substring-based machine translation

Machine Translation

Quantified Score

Hi-index	0.01

Visualization

Abstract

In statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected language such as Czech, this problem can be particularly severe. In addition, much of the morphological variation seen in Czech words is not reflected in either the morphology or syntax of a language like English. In this work, we show that using morphological analysis to modify the Czech input can improve a Czech-English machine translation system. We investigate several different methods of incorporating morphological information, and show that a system that combines these methods yields the best results. Our final system achieves a BLEU score of .333, as compared to .270 for the baseline word-to-word system.