Measuring user acceptability of machine translations to diagnose system errors: an experience report

Authors:
Bowen Hui
Affiliations:
University of Toronto, Canada
Venue:
COLING-MTIA '02 Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16
Year:
2002

Citing 8
Cited 2

Usability Engineering

Usability Engineering
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Extracting Conceptual Relationships from Specialized Documents

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
The lexical choice of prepositions in machine translation

The lexical choice of prepositions in machine translation
Evaluation metrics for knowledge-based machine translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Using test suites in evaluation of machine translation systems

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Towards better NLP system evaluation

HLT '94 Proceedings of the workshop on Human Language Technology
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998

Extracting conceptual relationships from specialized documents

Data & Knowledge Engineering - Special issue: ER 2002
A conjoint analysis framework for evaluating user preferences in machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional ways of measuring machine translation quality compares the accuracy of system output without clearly specifying what "accuracy" entails. Many current evaluation methods suffer from requiring too much time commitment from expert human evaluators. Moreover, these methods do not give direct feedback on user acceptability of the system, and do not hint on areas of focus for researchers or developers. In this work, we explore an output inspection method that measures user acceptance and pokes at system errors so that developers and researchers can walk away knowing what was acceptable and what to improve on. The evaluation framework for machine translation is described and experimental results for two systems are presented. The results of the experiments are very encouraging. We provide a discussion on identifying important translation quality factors for users, a pilot study of running this evaluation in the text summarization domain, and ideas on how to use the gathered data to create user profiles.