Linguistic features for automatic evaluation of heterogenous MT systems

Authors:
Jesús Giménez;Lluís Màrquez
Affiliations:
Universitat Politècnica de Catalunya, Barcelona;Universitat Politècnica de Catalunya, Barcelona
Venue:
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Year:
2007

Citing 11
Cited 31

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Filtering-Ranking Perceptron Learning for Partial Parsing

Machine Learning
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
QARLA: a framework for the evaluation of text summarization systems

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A robust combination strategy for semantic role labeling

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
MT evaluation: human-like vs. human acceptable

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation

Evaluating machine translation with LFG dependencies

Machine Translation
A re-examination on features in regression based approach to automatic MT evaluation

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Diagnostic evaluation of machine translation systems using automatically constructed linguistic check-points

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Decomposability of translation metrics for improved evaluation and efficient algorithms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semantic roles for SMT: a hybrid two-pass model

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
A smorgasbord of features for automatic MT evaluation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
On the robustness of syntactic and semantic features for automatic MT evaluation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
The contribution of linguistic features to automatic machine translation evaluation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Machine Translation
Cross-lingual annotation projection of semantic roles

Journal of Artificial Intelligence Research
ATEC: automatic evaluation of machine translation via word choice and word order

Machine Translation
MaxSim: performance and effects of translation fluency

Machine Translation
Metrics for MT evaluation: evaluating reordering

Machine Translation
Tackling sparse data issue in machine translation evaluation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Document-level automatic MT evaluation based on discourse representations

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
All in strings: a powerful string-based automatic MT evaluation metric with multiple granularities

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Improvement of machine translation evaluation by simple linguistically motivated features

Journal of Computer Science and Technology - Special issue on natural language processing
MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Structured vs. flat semantic role representations for machine translation evaluation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Approximating a deep-syntactic metric for MT evaluation and tuning

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
TINE: a metric to assess MT adequacy

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Corroborating text evaluation results with heterogeneous measures

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations

Journal of the American Society for Information Science and Technology
Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Semantic textual similarity for MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DCU-symantec submission for the WMT 2012 quality estimation task

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Match without a referee: evaluating MT adequacy without reference translations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Fully automatic semantic MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.