N-gram posterior probability confidence measures for statistical machine translation: an empirical study

Authors:
Adrià Gispert;Graeme Blackwood;Gonzalo Iglesias;William Byrne
Affiliations:
Machine Intelligence Laboratory, Department of Engineering, Cambridge University, Cambridge, UK;IBM T.J. Watson Research, Yorktown Heights, USA 10598;Machine Intelligence Laboratory, Department of Engineering, Cambridge University, Cambridge, UK;Machine Intelligence Laboratory, Department of Engineering, Cambridge University, Cambridge, UK
Venue:
Machine Translation
Year:
2013

Citing 29
Cited 1

Introduction to algorithms

Introduction to algorithms
Finite-state transducers in language and speech processing

Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A weighted finite state transducer implementation of the alignment template model for statistical machine translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Generation of word graphs in statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-level confidence estimation for machine translation using phrase-based translation models

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Word-Level Confidence Estimation for Machine Translation

Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Statistical approaches to computer-assisted translation

Computational Linguistics
Human interaction for high-quality machine translation

Communications of the ACM - A View of Parallel Computing
Rule filtering by pattern for efficient hierarchical translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Word lattices for multi-source translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Lattice Minimum Bayes-Risk decoding for statistical machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hierarchical phrase-based translation with weighted finite state transducers

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Model combination for machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Efficient path counting transducers for minimum bayes-risk decoding of statistical machine translation lattices

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Balancing user effort and translation error in interactive machine translation via confidence measures

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Error driven paraphrase annotation using Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
The CUED HiFST system for the WMT10 translation shared task

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Improving translation via targeted paraphrasing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars

Computational Linguistics
Hierarchical phrase-based translation representations

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
HMM Word and Phrase Alignment for Statistical Machine Translation

IEEE Transactions on Audio, Speech, and Language Processing

Quality estimation for machine translation: some lessons learned

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.