Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Authors:
Sonja Nießen;Hermann Ney
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2004

Citing 14
Cited 43

A statistical approach to machine translation

Computational Linguistics
Translating with Scarce Resources

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Inducing Features of Random Fields

Inducing Features of Random Fields
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A DP based search algorithm for statistical machine translation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improving statistical natural language translation with categories and rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Word re-ordering and DP-based search in statistical machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Improving SMT quality with morpho-syntactic analysis

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Architecture and design considerations in NESPOLE!: a speech translation system for e-commerce applications

HLT '01 Proceedings of the first international conference on Human language technology research
Fast decoding and optimal decoding for machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A maximum entropy/minimum divergence translation model

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
But dictionaries are data too

HLT '93 Proceedings of the workshop on Human Language Technology

Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Combination of Arabic preprocessing schemes for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Modelling lexical redundancy for machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Improving statistical machine translation using shallow linguistic knowledge

Computer Speech and Language
Improving phrase-based statistical machine translation with morphosyntactic transformation

Machine Translation
Statistical machine translation

ACM Computing Surveys (CSUR)
Pivot language approach for phrase-based statistical machine translation

Machine Translation
On the impact of morphology in English to Spanish statistical MT

Speech Communication
Context-based Arabic morphological analysis for machine translation

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Generalizing local and non-local word-reordering patterns for syntax-based machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Arabic preprocessing schemes for statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Bridging the inflection morphology gap for Arabic statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Coupling hierarchical word reordering and decoding in phrase-based statistical machine translation

SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
Exploring different representational units in English-to-Turkish statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Meta-structure transformation model for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Getting to know Moses: initial experiments on German--English factored translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Experiments in morphosyntactic processing for translating to and from German

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Phrase linguistic classification and generalization for improving statistical machine translation

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Automatic acquisition of bilingual rules for extraction of bilingual word pairs from parallel corpora

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Augmenting a small parallel text with morpho-syntactic language resources for Serbian-English statistical machine translation

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Initial explorations in English to Turkish statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Case markers and morphology: addressing the crux of the fluency problem in English-Hindi SMT

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation
Statistical machine translation into a morphologically complex language

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Informatica
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation

IEEE Transactions on Audio, Speech, and Language Processing
LIMSI's statistical translation systems for WMT'10

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
The LIG machine translation system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Nonparametric word segmentation for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Factored bilingual n-gram language models for statistical machine translation

Machine Translation
Syntax-based reordering for statistical machine translation

Computer Speech and Language
Crowdsourcing translation: professional quality from non-professionals

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A syntactic transformation model for statistical machine translation

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Aligning turkish and english parallel texts for statistical machine translation

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Empirical study of utilizing morph-syntactic information in SMT

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Orthographic and morphological processing for English---Arabic statistical machine translation

Machine Translation
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In statistical machine translation, correspondences between the words in the source and the target language are learned from parallel corpora, and often little or no linguistic knowledge is used to structure the underlying models. In particular, existing statistical systems for machine translation often treat different inflected forms of the same lemma as if they were independent of one another. The bilingual training data can be better exploited by explicitly taking into account the interdependencies of related inflected forms. We propose the construction of hierarchical lexicon models on the basis of equivalence classes of words. In addition, we introduce sentence-level restructuring transformations which aim at the assimilation of word order in related sentences. We have systematically investigated the amount of bilingual training data required to maintain an acceptable quality of machine translation. The combination of the suggested methods for improving translation quality in frameworks with scarce resources has been successfully tested: We were able to reduce the amount of bilingual training data to less than 10% of the original corpus, while losing only 1.6% in translation quality. The improvement of the translation results is demonstrated on two German-English corpora taken from the Verbmobil task and the Nespole! task.