Annotating named entities in Twitter data with crowdsourcing

Authors:
Tim Finin;Will Murnane;Anand Karandikar;Nicholas Keller;Justin Martineau;Mark Dredze
Affiliations:
University of Maryland, Baltimore, MD;University of Maryland, Baltimore, MD;University of Maryland, Baltimore, MD;University of Maryland, Baltimore, MD;University of Maryland, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 6
Cited 27

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Natural Language Processing with Python

Natural Language Processing with Python

Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Piggyback: using search engines for robust cross-domain named entity recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Part-of-speech tagging for Twitter: annotation, features, and experiments

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Object matching in tweets with spatial models

Proceedings of the fifth ACM international conference on Web search and data mining
Correlating financial time series with micro-blogging activity

Proceedings of the fifth ACM international conference on Web search and data mining
Mining the interests of Chinese microbloggers via keyword extraction

Frontiers of Computer Science in China
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning with Amazon Mechanical Turk

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Twitter catches the flu: detecting influenza epidemics using Twitter

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Proceedings of the 21st international conference on World Wide Web
Towards building large-scale distributed systems for twitter sentiment analysis

Proceedings of the 27th Annual ACM Symposium on Applied Computing
TwiNER: named entity recognition in targeted twitter stream

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing research opportunities: lessons from natural language processing

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
The role of emotional stability in Twitter conversations

Proceedings of the Workshop on Semantic Analysis in Social Media
Joint inference of named entity recognition and normalization for tweets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Enhancing online 3D products through crowdsourcing

Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Two-stage NER for tweets with clustering

Information Processing and Management: an International Journal
Named entity recognition for tweets

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Streaming trend detection in Twitter

International Journal of Web Based Communities
Microblog-genre noise and impact on semantic annotation accuracy

Proceedings of the 24th ACM Conference on Hypertext and Social Media
FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Proceedings of the 22nd international conference on World Wide Web companion
Practical extraction of disaster-relevant information from social media

Proceedings of the 22nd international conference on World Wide Web companion
RESLVE: leveraging user interest to improve entity disambiguation on short text

Proceedings of the 22nd international conference on World Wide Web companion
Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach

Proceedings of the VLDB Endowment
Large-scale linked data integration using probabilistic reasoning and crowdsourcing

The VLDB Journal — The International Journal on Very Large Data Bases
Bucking the trend: improved evaluation and annotation practices for ESL error detection systems

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe our experience using both Amazon Mechanical Turk (MTurk) and Crowd-Flower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionally been the focus of named entity experiments, Twitter is far more informal and abbreviated. The collected annotations and annotation techniques will provide a first step towards the full study of named entity recognition in domains like Facebook and Twitter. We also briefly describe how to use MTurk to collect judgements on the quality of "word clouds."