Automatic and human evaluation on english-croatian legislative test set

Authors:
Marija Brkić;Sanja Seljan;Tomislav Vičić
Affiliations:
Department of Informatics, University of Rijeka, Rijeka, Croatia;Department of Information Sciences, Faculty of Humanities and Social Sciences, Zagreb, Croatia;Freelance translator, Croatia
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Year:
2013

Citing 7
Cited 0

Principles of Context-Based Machine Translation Evaluation

Machine Translation
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Statistical Machine Translation

Statistical Machine Translation
Blast: a tool for error analysis of machine translation output

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents work on the manual and automatic evaluation of the online available machine translation (MT) service Google Translate, for the English-Croatian language pair in legislation and general domains. The experimental study is conducted on the test set of 200 sentences in total. Human evaluation is performed by native speakers, using the criteria of fluency and adequacy, and it is enriched by error analysis. Automatic evaluation is performed on a single reference set by using the following metrics: BLEU, NIST, F-measure and WER. The influence of lowercasing, tokenization and punctuation is discussed. Pearson's correlation between automatic metrics is given, as well as correlation between the two criteria, fluency and adequacy, and automatic metrics.