How good is the crowd at "real" WSD?

Authors:
Jisup Hong;Collin F. Baker
Affiliations:
International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA
Venue:
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Year:
2011

Citing 6
Cited 2

Designing games with a purpose

Communications of the ACM - Designing games with a purpose
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Graded word sense assignment

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Rating computer-generated questions with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk

FrameNet, current collaborations and future goals

Language Resources and Evaluation
Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a great deal of excitement recently about using the "wisdom of the crowd" to collect data of all kinds, quickly and cheaply (Howe, 2008; von Ahn and Dabbish, 2008). Snow et al. (Snow et al., 2008) were the first to give a convincing demonstration that at least some kinds of linguistic data can be gathered from workers on the web more cheaply than and as accurately as from local experts, and there has been a steady stream of papers and workshops since then with similar results. e.g. (Callison-Burch and Dredze, 2010). Many of the tasks which have been successfully crowdsourced involve judgments which are similar to those performed in everyday life, such as recognizing unclear writing (von Ahn et al., 2008), or, for those tasks that require considerable judgment, the responses are usually binary or from a small set of responses, such as sentiment analysis (Mellebeek et al., 2010) or ratings (Heilman and Smith, 2010). Since the FrameNet process is known to be relatively expensive, we were interested in whether the FrameNet process of fine word sense discrimination and marking of dependents with semantic roles could be performed more cheaply and equally accurately using Amazon's Mechanical Turk (AMT) or similar resources. We report on a partial success in this respect and how it was achieved.