Morphological analysis for statistical machine translation

Authors:
Young-Suk Lee
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 8
Cited 44

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Improving SMT quality with morpho-syntactic analysis

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A projection extension algorithm for statistical machine translation

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improving phrase-based statistical machine translation with morphosyntactic transformation

Machine Translation
Chinese word segmentation and statistical machine translation

ACM Transactions on Speech and Language Processing (TSLP)
Automatically identifying localizable queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On the impact of morphology in English to Spanish statistical MT

Speech Communication
Boosting statistical machine translation by lemmatization and linear interpolation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Context-based Arabic morphological analysis for machine translation

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Syntactic phrase reordering for English-to-Arabic statistical machine translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Arabic preprocessing schemes for statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Bridging the inflection morphology gap for Arabic statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Combination of statistical word alignments based on multiple preprocessing schemes

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Syntactic reordering for English-Arabic phrase-based machine translation

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Exploring different representational units in English-to-Turkish statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Improving Arabic-Chinese statistical machine translation using English as pivot language

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Phrase linguistic classification and generalization for improving statistical machine translation

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Morpho-syntactic information for automatic error analysis of statistical machine translation output

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Initial explorations in English to Turkish statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Morpho-syntactic Arabic preprocessing for Arabic-to-English statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation
Statistical machine translation into a morphologically complex language

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Informatica
Overview of Morpho challenge 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Morpho challenge evaluation by information retrieval experiments

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Unsupervised search for the optimal segmentation for statistical machine translation

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation

IEEE Transactions on Audio, Speech, and Language Processing
A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enhancing morphological alignment for translating highly inflected languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Overview and results of Morpho challenge 2009

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Translating from morphologically complex languages: a paraphrase-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Aligning turkish and english parallel texts for statistical machine translation

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation

DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
A correction model for word alignments

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Methods for integrating rule-based and statistical systems for Arabic to English machine translation

Machine Translation
Orthographic and morphological processing for English---Arabic statistical machine translation

Machine Translation
A comparison of segmentation methods and extended lexicon models for Arabic statistical machine translation

Machine Translation
The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation

Machine Translation
Machine translation of Arabic dialects

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Machine translation without words through substring alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
The study of effect of length in morphological segmentation of agglutinative languages

MM '12 Proceedings of the First Workshop on Multilingual Modeling
Substring-based machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel morphological analysis technique which induces a morphological and syntactic symmetry between two languages with highly asymmetrical morphological structures to improve statistical machine translation qualities. The technique pre-supposes fine-grained segmentation of a word in the morphologically rich language into the sequence of prefix(es)-stem-suffix(es) and part-of-speech tagging of the parallel corpus. The algorithm identifies morphemes to be merged or deleted in the morphologically rich language to induce the desired morphological and syntactic symmetry. The technique improves Arabic-to-English translation qualities significantly when applied to IBM Model 1 and Phrase Translation Models trained on the training corpus size ranging from 3,500 to 3.3 million sentence pairs.