FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Authors:
Diego Marinho de Oliveira;Alberto H.F. Laender;Adriano Veloso;Altigran S. da Silva
Affiliations:
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal do Amazonas, Manaus, Brazil
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 9
Cited 1

The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Annotating named entities in Twitter data with crowdsourcing

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Discovering users' topics of interest on twitter: a first look

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Part-of-speech tagging for Twitter: annotation, features, and experiments

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Online named entity recognition method for microtexts in social networking services: A case study of twitter

Expert Systems with Applications: An International Journal
TwiNER: named entity recognition in targeted twitter stream

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Exploiting entities in social media

Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data for web search and mining applications. Tasks such as Named Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised when applied to Twitter data, since messages are terse, poorly worded and posted in many different languages. Also, Twitter follows a streaming paradigm, imposing that entities must be recognized in real-time. In view of these challenges and the inappropriateness of existing tools, we propose a novel approach for Named Entity Recognition on Twitter data called FS-NER (Filter-Stream Named Entity Recognition). FS-NER is characterized by the use of filters that process unlabeled Twitter messages, being much more practical than existing supervised CRF-based approaches. Such filters can be combined either in sequence or in parallel in a flexible way. Moreover, because these filters are not language dependent, FS-NER can be applied to different languages without requiring a laborious adaptation. Through a systematic evaluation using three Twitter collections and considering seven types of entity, we show that FS-NER performs 3% better than a CRF-based baseline, besides being orders of magnitude faster and much more practical.