Cost-sensitive active learning for computer-assisted translation

Authors:
Jesús González-Rubio;Francisco Casacuberta
Affiliations:
-;-
Venue:
Pattern Recognition Letters
Year:
2014

Citing 35
Cited 0

Training connectionist networks with queries and selective sampling

Advances in neural information processing systems 2
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Target-Text Mediated Interactive Machine Translation

Machine Translation
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Text prediction for translators

Text prediction for translators
Trans Type: Development-Evaluation Cycles to Boost Translator's Productivity

Machine Translation
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Efficient search for interactive statistical machine translation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
The statistical significance of the MUC-4 results

MUC4 '92 Proceedings of the 4th conference on Message understanding
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
TransType: a computer-aided translation typing system

NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5
Learning finite-state models for machine translation

Machine Learning
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-Level Confidence Estimation for Machine Translation

Computational Linguistics
Hierarchical sampling for active learning

Proceedings of the 25th international conference on Machine learning
Statistical approaches to computer-assisted translation

Computational Linguistics
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning for statistical phrase-based machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Online learning for interactive statistical machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Balancing user effort and translation error in interactive machine translation via confidence measures

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Does more data always yield better translations?

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Active learning for interactive machine translation

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Comparing human perceptions of post-editing effort with post-editing operations

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Analysing the effect of out-of-domain data on SMT systems

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.10

Visualization

Abstract

Machine translation technology is not perfect. To be successfully embedded in real-world applications, it must compensate for its imperfections by interacting intelligently with the user within a computer-assisted translation framework. The interactive-predictive paradigm, where both a statistical translation model and a human expert collaborate to generate the translation, has been shown to be an effective computer-assisted translation approach. However, the exhaustive supervision of all translations and the use of non-incremental translation models penalizes the productivity of conventional interactive-predictive systems. We propose a cost-sensitive active learning framework for computer-assisted translation whose goal is to make the translation process as painless as possible. In contrast to conventional active learning scenarios, the proposed active learning framework is designed to minimize not only how many translations the user must supervise but also how difficult each translation is to supervise. To do that, we address the two potential drawbacks of the interactive-predictive translation paradigm. On the one hand, user effort is focused to those translations whose user supervision is considered more ''informative'', thus, maximizing the utility of each user interaction. On the other hand, we use a dynamic machine translation model that is continually updated with user feedback after deployment. We empirically validated each of the technical components in simulation and quantify the user effort saved. We conclude that both selective translation supervision and translation model updating lead to important user-effort reductions, and consequently to improved translation productivity.