Introduction to algorithms
A new quantitative quality measure for machine translation systems
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
QARLA: a framework for the evaluation of text summarization systems
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatically evaluating answers to definition questions
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Paraphrasing for automatic evaluation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
MT evaluation: human-like vs. human acceptable
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Task-based evaluation of text summarization using Relevance Prediction
Information Processing and Management: an International Journal
Regression for machine translation evaluation at the sentence level
Machine Translation
That's nice... what can you do with it?
Computational Linguistics
Online large-margin training of syntactic and structural translation features
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Decomposability of translation metrics for improved evaluation and efficient algorithms
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Complexity of finding the BLEU-optimal hypothesis in a confusion network
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Linguistic features for automatic evaluation of heterogenous MT systems
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
The role of pseudo references in MT evaluation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
On the robustness of syntactic and semantic features for automatic MT evaluation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Gaming fluency: evaluating the bounds and expectations of segment-based translation memory
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Robust machine translation evaluation with entailment features
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Edit distances with block movements and error rate confidence estimates
Machine Translation
CONANN: an online biomedical concept annotator
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Significance tests of automatic machine translation evaluation metrics
Machine Translation
Machine translation evaluation versus quality estimation
Machine Translation
Metrics for MT evaluation: evaluating reordering
Machine Translation
Taming structured perceptrons on wild feature vectors
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Further meta-evaluation of broad-coverage surface realization
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Comparing rating scales and preference judgements in language evaluation
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Generating referring expressions in context: the GREC task evaluation challenges
Empirical methods in natural language generation
Linguistic measures for automatic machine translation evaluation
Machine Translation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
AM-FM: a semantic framework for translation quality assessment
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Regression and ranking based optimisation for sentence level machine translation evaluation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The RWTH system combination system for WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Optimal search for minimum error rate training
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Corroborating text evaluation results with heterogeneous measures
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Evaluation of arabic machine translation system based on the universal networking language
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Hope and fear for discriminative training of statistical translation models
The Journal of Machine Learning Research
HyTER: meaning-equivalent semantics for translation evaluation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Structured ramp loss minimization for machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Batch tuning strategies for statistical machine translation
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A beam-search decoder for grammatical error correction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
DFKI's SMT system for WMT 2012
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Optimization strategies for online large-margin learning in machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Lattice BLEU oracles in machine translation
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
Comparisons of automatic evaluation metrics for machine translation are usually conducted on corpus level using correlation statistics such as Pearson's product moment correlation coefficient or Spearman's rank order correlation coefficient between human scores and automatic scores. However, such comparisons rely on human judgments of translation qualities such as adequacy and fluency. Unfortunately, these judgments are often inconsistent and very expensive to acquire. In this paper, we introduce a new evaluation method, Orange, for evaluating automatic machine translation evaluation metrics automatically without extra human involvement other than using a set of reference translations. We also show the results of comparing several existing automatic metrics and three new automatic metrics using Orange.