Improving SMT quality with morpho-syntactic analysis

Authors:
Sonja Nießen;Hermann Ney
Affiliations:
RWTH - University of Technology Aachen, Aachen, Germany;RWTH - University of Technology Aachen, Aachen, Germany
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Year:
2000

Citing 5
Cited 18

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Decoding algorithm in statistical machine translation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A DP based search algorithm for statistical machine translation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improving statistical natural language translation with categories and rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3

The RWTH system for statistical translation of spoken dialogues

HLT '01 Proceedings of the first international conference on Human language technology research
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
On the impact of morphology in English to Spanish statistical MT

Speech Communication
Induction of cross-language affix and letter sequence correspondence

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Morphological analysis for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Bridging the inflection morphology gap for Arabic statistical machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Word error rates: decomposition over Pos classes and applications for error analysis

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Getting to know Moses: initial experiments on German--English factored translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Morpho-syntactic information for automatic error analysis of statistical machine translation output

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Reduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language

Informatica
How to avoid burning ducks: combining linguistic analysis and corpus statistics for German compound processing

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan---Spanish language pair

Language Resources and Evaluation
Pre- and postprocessing for statistical machine translation into Germanic languages

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Statistical machine translation of german compound words

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Machine translation without words through substring alignment

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Substring-based machine translation

Machine Translation
Generation of compound words in statistical machine translation into compounding languages

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the framework of statistical machine translation (SMT), correspondences between the words in the source and the target language are learned from bilingual corpora on the basis of so-called alignment models. Many of the statistical systems use little or no linguistic knowledge to structure the underlying models. In this paper we argue that training data is typically not large enough to sufficiently represent the range of different phenomena in natural languages and that SMT can take advantage of the explicit introduction of some knowledge about the languages under consideration. The improvement of the translation results is demonstrated on two different German-English corpora.