Assessing phrase-based translation models with oracle decoding

Authors:
Guillaume Wisniewski;Alexandre Allauzen;François Yvon
Affiliations:
Univ. Paris Sud, France;Univ. Paris Sud, France;Univ. Paris Sud, France
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 22
Cited 4

A systematic comparison of various statistical alignment models

Computational Linguistics
Fast and optimal decoding for machine translation

Artificial Intelligence
Fast decoding and optimal decoding for machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Greedy decoding for statistical machine translation in almost linear time

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Integer linear programming inference for conditional random fields

ICML '05 Proceedings of the 22nd international conference on Machine learning
A discriminative global training algorithm for statistical MT

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An end-to-end discriminative approach to machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Local phrase reordering models for statistical machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The complexity of phrase alignment problems

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Translation as weighted deduction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Complexity of finding the BLEU-optimal hypothesis in a confusion network

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient extraction of oracle-best translations from hypergraphs

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Comparing reordering constraints for SMT using efficient Bleu oracle computation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Rich source-side context for statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Learning performance of a machine translation system: a statistical and computational analysis

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A systematic analysis of translation model search spaces

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Word graphs for statistical machine translation

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Example-based paraphrasing for improved phrase-based statistical machine translation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning to translate: a statistical and computational analysis

Advances in Artificial Intelligence
Computing lattice BLEU oracle scores for machine translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Lattice BLEU oracles in machine translation

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extant Statistical Machine Translation (SMT) systems are very complex softwares, which embed multiple layers of heuristics and embark very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this study, we make a step in that direction and present an attempt to evaluate the quality of the phrase-based translation model. In order to identify those translation errors that stem from deficiencies in the phrase table (PT), we propose to compute the oracle BLEU-4 score, that is the best score that a system based on this PT can achieve on a reference corpus. By casting the computation of the oracle BLEU-1 as an Integer Linear Programming (ILP) problem, we show that it is possible to efficiently compute accurate lower-bounds of this score, and report measures performed on several standard benchmarks. Various other applications of these oracle decoding techniques are also reported and discussed.