Forced derivation tree based model training to statistical machine translation

Authors:
Nan Duan;Mu Li;Ming Zhou
Affiliations:
Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 28
Cited 0

A limited memory algorithm for bound constrained optimization

SIAM Journal on Scientific Computing
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved discriminative bilingual word alignment

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Maximum entropy based phrase reordering model for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semi-supervised training for statistical word alignment

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Do we need phrases?: challenging the conventional wisdom in statistical machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Forest-based translation rule extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A simple and effective hierarchical phrase reordering model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using a dependency parser to improve SMT for subject-object-verb languages

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Statistical machine translation of Euparl data by using bilingual N-grams

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Better k-best parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Discriminative reordering models for statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Weighted alignment matrices for statistical machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Metrics for MT evaluation: evaluating reordering

Machine Translation
Training phrase translation models with leaving-one-out

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically learning source-side reordering rules for large scale machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Constituent reordering and syntax models for English-to-Japanese statistical machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Discriminative word alignment by linear modeling

Computational Linguistics
Rule Markov models for fast tree-to-string translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Reordering metrics for MT

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A lightweight evaluation framework for machine translation reordering

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Inducing sentence structure from parallel corpora for reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can translate f into its accurate target translation e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTG-based phrasal system, and propose four FDT-based component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.