Named entity recognition in tweets: an experimental study

Authors:
Alan Ritter;Sam Clark; Mausam;Oren Etzioni
Affiliations:
University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 35
Cited 32

Class-based n-gram models of natural language

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Named entity recognition as a house of cards: classifier stacking

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Lexical and Discourse Analysis of Online Chat Dialog

ICSC '07 Proceedings of the International Conference on Semantic Computing
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Normalizing SMS: are two metaphors better than one?

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Structured generative models for unsupervised named-entity clustering

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Minimally-supervised extraction of entities from text advertisements

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Not all seeds are equal: measuring the quality of text mining seeds

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Experiments in graph-based semi-supervised learning methods for class-instance acquisition

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Annotating named entities in Twitter data with crowdsourcing

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Equations for part-of-speech tagging

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Unsupervised discovery of negative categories in lexicon bootstrapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Incorporating content structure into text analysis applications

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Lexical normalisation of short text messages: makn sens a #twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Event discovery in social media feeds

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Part-of-speech tagging for Twitter: annotation, features, and experiments

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Contextual bearing on linguistic variation in social media

LSM '11 Proceedings of the Workshop on Languages in Social Media

Mining the interests of Chinese microbloggers via keyword extraction

Frontiers of Computer Science in China
Open domain event extraction from twitter

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
TwiNER: named entity recognition in targeted twitter stream

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A support platform for event detection using social intelligence

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Learning from bullying traces in social media

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A hybrid framework for scalable opinion mining in social media: detecting polarities and attitude targets

Proceedings of the Workshop on Semantic Analysis in Social Media
Re-tweeting from a linguistic perspective

LSM '12 Proceedings of the Second Workshop on Language in Social Media
Joint inference of named entity recognition and normalization for tweets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A broad-coverage normalization system for social media language

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Automatically constructing a normalisation dictionary for microblogs

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Twevent: segment-based event detection from tweets

Proceedings of the 21st ACM international conference on Information and knowledge management
Community-based classification of noun phrases in twitter

Proceedings of the 21st ACM international conference on Information and knowledge management
Two-stage NER for tweets with clustering

Information Processing and Management: an International Journal
Lexical normalization for social media text

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Extraction and Compilation of Events and Sub-events from Twitter

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
I can do text analytics!: designing development tools for novice developers

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Introducing baselines for russian named entity recognition

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Microblog-genre noise and impact on semantic annotation accuracy

Proceedings of the 24th ACM Conference on Hypertext and Social Media
Harnessing linked knowledge sources for topic classification in social media

Proceedings of the 24th ACM Conference on Hypertext and Social Media
Exploiting hybrid contexts for Tweet segmentation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A hybrid approach for spotting, disambiguating and annotating places in user-generated text

Proceedings of the 22nd international conference on World Wide Web companion
FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Proceedings of the 22nd international conference on World Wide Web companion
Location extraction from disaster-related microblogs

Proceedings of the 22nd international conference on World Wide Web companion
RESLVE: leveraging user interest to improve entity disambiguation on short text

Proceedings of the 22nd international conference on World Wide Web companion
A CRM system for social media: challenges and experiences

Proceedings of the 22nd international conference on World Wide Web
Exploring friend's influence in cultures in Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Dynamic multi-faceted topic discovery in twitter

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach

Proceedings of the VLDB Endowment
Chinese-English mixed text normalization

Proceedings of the 7th ACM international conference on Web search and data mining
Entity linking at the tail: sparse signals, unknown entities, and phrase models

Proceedings of the 7th ACM international conference on Web search and data mining
An algorithm for local geoparsing of microtext

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F1 score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http://github.com/aritter/twitter_nlp