Information Retrieval
A new quantitative quality measure for machine translation systems
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A unified framework for automatic evaluation using N-gram co-occurrence statistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Using multiple edit distances to automatically grade outputs from Machine translation systems
IEEE Transactions on Audio, Speech, and Language Processing
Statistical machine translation
ACM Computing Surveys (CSUR)
Regression for machine translation evaluation at the sentence level
Machine Translation
Information and Software Technology
Automated metrics for speech translation
PerMIS '09 Proceedings of the 9th Workshop on Performance Metrics for Intelligent Systems
Linguistic measures for automatic machine translation evaluation
Machine Translation
MEDITE: a unilingual textual aligner
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Corroborating text evaluation results with heterogeneous measures
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Effective co-reference resolution in clinical text
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Computer Speech and Language
Hi-index | 0.00 |
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexible, parametrized models can be learned from past data and automatically optimized to correlate well with human judgments for different criteria (e.g. adequacy, fluency) using different correlation measures. Towards this end, we discuss ACS (all common skip-ngrams), a practical algorithm with trainable parameters that estimates reference-candidate translation overlap by computing a weighted sum of all common skip-ngrams in polynomial time. We show that the BLEU and ROUGE metric families are special cases of BLANC, and we compare correlations with human judgments across these three metric families. We analyze the algorithmic complexity of ACS and argue that it is more powerful in modeling both local meaning and sentence-level structure, while offering the same practicality as the established algorithms it generalizes.