Evaluation in the ARPA machine translation program: 1993 methodology

Authors:
John S. White;Theresa A. O'Connell
Affiliations:
PRC Inc., McLean, VA;PRC Inc., McLean, VA
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 1
Cited 9

Evaluation of machine translation

HLT '93 Proceedings of the workshop on Human Language Technology

Translation with Scarce Bilingual Resources

Machine Translation
The automatic translation of discourse structures

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Example-Based Machine Translation in the Pangloss system

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Session summary

HLT '94 Proceedings of the workshop on Human Language Technology
An empirical study in multilingual natural language generation: what should a text planner do?

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
End-to-end evaluation in simultaneous translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
ILR-based MT comprehension test with multi-level questions

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Unification-based glossing

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Filling knowledge gaps in a broad coverage machine translation system

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the second year of evaluations of the ARPA HLT Machine Translation (MT) Initiative, methodologies developed and tested in 1992 were applied to the 1993 MT test runs. The current methodology optimizes the inherently subjective judgments on translation accuracy and quality by channeling the judgments of non-translators into many data points which reflect both the comparison of the performance of the research MT systems with production MT systems and against the performance of novice translators. This paper discusses the three evaluation methods used in the 1993 evaluation, the results of the evaluations, and preliminary characterizations of the Winter 1994 evaluation, now underway. The efforts under discussion focus on measuring the progress of core MT technology and increasing the sensitivity and portability of MT evaluation methodology.