Error detection for statistical machine translation using linguistic features

Authors:
Deyi Xiong;Min Zhang;Haizhou Li
Affiliations:
Institute for Infocomm Research, Connexis, Singapore;Institute for Infocomm Research, Connexis, Singapore;Institute for Infocomm Research, Connexis, Singapore
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 10
Cited 7

A maximum entropy approach to natural language processing

Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Confidence estimation for translation prediction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Using a mixture of N-best lists from multiple MT systems in rank-sum-based confidence measure for MT outputs

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Error detection using linguistic features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Word-Level Confidence Estimation for Machine Translation

Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
N-gram posterior probabilities for statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

Goodness: a method for measuring machine translation confidence

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using parallel features in parsing of machine-translated sentences for correction of grammatical errors

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Linguistic features for quality estimation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Match without a referee: evaluating MT adequacy without reference translations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
A diagnostic evaluation approach for english to hindi MT using linguistic checkpoints and error rates

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Quality estimation for machine translation: some lessons learned

Machine Translation
Investigating the contribution of linguistic information to quality estimation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from N-best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.