Automatic construction of machine translation knowledge using translation literalness

Authors:
Kenji Imamura;Eiichiro Sumita;Yuji Matsumoto
Affiliations:
ATR Spoken Language Translation, Kyoto, Japan;ATR Spoken Language Translation, Kyoto, Japan;Nara Institute of Science and Technology, Ikoma-shi, Nara, Japan
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Year:
2003

Citing 6
Cited 8

Models of translational equivalence among words

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Chart-based transfer rule application in Machine Translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Constituent boundary parsing for example-based machine translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14

A corpus-centered approach to spoken language translation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Extending MT evaluation tools with translation complexity metrics

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A survey of types of text noise and techniques to handle noisy text

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Training data modification for SMT considering groups of synonymous sentences

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Data cleaning for word alignment

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Unsupervised cleansing of noisy text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Panning for EBMT gold, or "Remembering not to forget"

Machine Translation
Automatic filtering of bilingual corpora for statistical machine translation

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

When machine translation (MT) knowledge is automatically constructed from bilingual corpora, redundant rules are acquired due to translation variety. These rules increase ambiguity or cause incorrect MT results. To overcome this problem, we constrain the sentences used for knowledge extraction to "the appropriate bilingual sentences for the MT." In this paper, we propose a method using translation literalness to select appropriate sentences or phrases. The translation correspondence rate (TCR) is defined as the literalness measure.Based on the TCR, two automatic construction methods are tested. One is to filter the corpus before rule acquisition. The other is to split the acquisition process into two phases, where a bilingual sentence is divided into literal parts and the other parts before different generalizations are applied. The effects are evaluated by the MT quality, and about 4.9% of MT results were improved by the latter method.