BLANC: learning evaluation metrics for MT

Authors:
Lucian Vlad Lita;Monica Rogati;Alon Lavie
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 7
Cited 10

Information Retrieval

Information Retrieval
A new quantitative quality measure for machine translation systems

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A unified framework for automatic evaluation using N-gram co-occurrence statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Using multiple edit distances to automatically grade outputs from Machine translation systems

IEEE Transactions on Audio, Speech, and Language Processing

Statistical machine translation

ACM Computing Surveys (CSUR)
Regression for machine translation evaluation at the sentence level

Machine Translation
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
Improving component selection and monitoring with controlled experimentation and automated measurements

Information and Software Technology
Automated metrics for speech translation

PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Linguistic measures for automatic machine translation evaluation

Machine Translation
MEDITE: a unilingual textual aligner

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Corroborating text evaluation results with heterogeneous measures

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Effective co-reference resolution in clinical text

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Evaluation methodology and metrics employed to assess the TRANSTAC two-way, speech-to-speech translation systems

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexible, parametrized models can be learned from past data and automatically optimized to correlate well with human judgments for different criteria (e.g. adequacy, fluency) using different correlation measures. Towards this end, we discuss ACS (all common skip-ngrams), a practical algorithm with trainable parameters that estimates reference-candidate translation overlap by computing a weighted sum of all common skip-ngrams in polynomial time. We show that the BLEU and ROUGE metric families are special cases of BLANC, and we compare correlations with human judgments across these three metric families. We analyze the algorithmic complexity of ACS and argue that it is more powerful in modeling both local meaning and sentence-level structure, while offering the same practicality as the established algorithms it generalizes.