Models of translational equivalence among words
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Chart-based transfer rule application in Machine Translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Constituent boundary parsing for example-based machine translation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A corpus-centered approach to spoken language translation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Extending MT evaluation tools with translation complexity metrics
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A survey of types of text noise and techniques to handle noisy text
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Training data modification for SMT considering groups of synonymous sentences
EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Data cleaning for word alignment
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Unsupervised cleansing of noisy text
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Panning for EBMT gold, or "Remembering not to forget"
Machine Translation
Automatic filtering of bilingual corpora for statistical machine translation
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Hi-index | 0.00 |
When machine translation (MT) knowledge is automatically constructed from bilingual corpora, redundant rules are acquired due to translation variety. These rules increase ambiguity or cause incorrect MT results. To overcome this problem, we constrain the sentences used for knowledge extraction to "the appropriate bilingual sentences for the MT." In this paper, we propose a method using translation literalness to select appropriate sentences or phrases. The translation correspondence rate (TCR) is defined as the literalness measure.Based on the TCR, two automatic construction methods are tested. One is to filter the corpus before rule acquisition. The other is to split the acquisition process into two phases, where a bilingual sentence is divided into literal parts and the other parts before different generalizations are applied. The effects are evaluated by the MT quality, and about 4.9% of MT results were improved by the latter method.