Efficient extraction of oracle-best translations from hypergraphs

Authors:
Zhifei Li;Sanjeev Khudanpur
Affiliations:
The Johns Hopkins University, Baltimore, MD;The Johns Hopkins University, Baltimore, MD
Venue:
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Year:
2009

Citing 6
Cited 8

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Hierarchical Phrase-Based Translation

Computational Linguistics
A scalable decoder for parsing-based machine translation with equivalent language model state maintenance

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Comparing reordering constraints for SMT using efficient Bleu oracle computation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Better k-best parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology

Demonstration of Joshua: an open source toolkit for parsing-based machine translation

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Context-free reordering, finite-state translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joshua 2.0: a toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Assessing phrase-based translation models with oracle decoding

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Machine translation system combination by confusion forest

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Computing lattice BLEU oracle scores for machine translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Oracle decoding as a new way to analyze phrase-based machine translation

Machine Translation
Lattice BLEU oracles in machine translation

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hypergraphs are used in several syntax-inspired methods of machine translation to compactly encode exponentially many translation hypotheses. The hypotheses closest to given reference translations therefore cannot be found via brute force, particularly for popular measures of closeness such as BLEU. We develop a dynamic program for extracting the so called oracle-best hypothesis from a hypergraph by viewing it as the problem of finding the most likely hypothesis under an n-gram language model trained from only the reference translations. We further identify and remove massive redundancies in the dynamic program state due to the sparsity of n-grams present in the reference translations, resulting in a very efficient program. We present runtime statistics for this program, and demonstrate successful application of the hypotheses thus found as the targets for discriminative training of translation system components.