Creating a bi-lingual entailment corpus through translations with Mechanical Turk: $100 for a 10-day rush

Authors:
Matteo Negri;Yashar Mehdad
Affiliations:
FBK-Irst, Trento, Italy;University of Trento, Trento, Italy
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 4
Cited 9

Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The lie detector: explorations in the automatic recognition of deceptive language

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Towards cross-lingual textual entailment

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using bilingual parallel corpora for cross-lingual textual entailment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CoSyne: a framework for multilingual content synchronization of wikis

Proceedings of the 7th International Symposium on Wikis and Open Collaboration
Towards strict sentence intersection: decoding and evaluation strategies

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Crowdsourcing research opportunities: lessons from natural language processing

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
JU_CSE_NLP: language independent cross-lingual textual entailment system

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
FBK: cross-lingual textual entailment without translation

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Detecting semantic equivalence and information disparity in cross-lingual documents

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on experiments in the creation of a bi-lingual Textual Entailment corpus, using non-experts' workforce under strict cost and time limitations ($100, 10 days). To this aim workers have been hired for translation and validation tasks, through the Crowd-Flower channel to Amazon Mechanical Turk. As a result, an accurate and reliable corpus of 426 English/Spanish entailment pairs has been produced in a more cost-effective way compared to other methods for the acquisition of translations based on crowdsourcing. Focusing on two orthogonal dimensions (i.e. reliability of annotations made by non experts, and overall corpus creation costs), we summarize the methodology we adopted, the achieved results, the main problems encountered, and the lessons learned.