Using Mechanical Turk to build machine translation evaluation sets

Authors:
Michael Bloodgood;Chris Callison-Burch
Affiliations:
Johns Hopkins University;Johns Hopkins University
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 9
Cited 11

Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Taking into account the differences between actively and passively acquired data: the case of active learning with support vector machines for imbalanced datasets

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Findings of the 2009 workshop on statistical machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax augmented machine translation via chart parsing

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Bucking the trend: large-scale cost-focused active learning for statistical machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Collecting highly parallel data for paraphrase evaluation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Crowdsourcing translation: professional quality from non-professionals

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parallel sentence generation from comparable corpora for improved SMT

Machine Translation
Data-driven response generation in social media

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Towards multimodal deception detection -- step 1: building a collection of deceptive videos

Proceedings of the 14th ACM international conference on Multimodal interaction
How to filter out random clickers in a crowdsourcing-based study?

Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization
Perspectives on crowdsourcing annotations for natural language processing

Language Resources and Evaluation
Constructing an anonymous dataset from the personal digital photo libraries of mac app store users

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for each case. In this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT test sets cheaply. We find that MTurk can be used to make test sets much cheaper than professionally-produced test sets. More importantly, in experiments with multiple MT systems, we find that the MTurk-produced test sets yield essentially the same conclusions regarding system performance as the professionally-produced test sets yield.