Using Mechanical Turk to annotate lexicons for less commonly used languages

Authors:
Ann Irvine;Alexandre Klementiev
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 8
Cited 9

Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing translation lexicons via diverse similarity measures and bridge languages

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Collecting highly parallel data for paraphrase evaluation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Crowdsourcing translation: professional quality from non-professionals

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Crowdsourcing research opportunities: lessons from natural language processing

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Language identification for creating language-specific Twitter collections

LSM '12 Proceedings of the Second Workshop on Language in Social Media
Perspectives on crowdsourcing annotations for natural language processing

Language Resources and Evaluation
Bucking the trend: improved evaluation and annotation practices for ESL error detection systems

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we present results from using Amazon's Mechanical Turk (MTurk) to annotate translation lexicons between English and a large set of less commonly used languages. We generate candidate translations for 100 English words in each of 42 foreign languages using Wikipedia and a lexicon induction framework. We evaluate the MTurk annotations by using positive and negative control candidate translations. Additionally, we evaluate the annotations by adding pairs to our seed dictionaries, providing a feedback loop into the induction system. MTurk workers are more successful in annotating some languages than others and are not evenly distributed around the world or among the world's languages. However, in general, we find that MTurk is a valuable resource for gathering cheap and simple annotations for most of the languages that we explored, and these annotations provide useful feedback in building a larger, more accurate lexicon.