Discovering filter keywords for company name disambiguation in twitter

Authors:
Damiano Spina;Julio Gonzalo;Enrique Amigó
Affiliations:
-;-;-
Venue:
Expert Systems with Applications: An International Journal
Year:
2013

Citing 45
Cited 0

KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Grouping search-engine returned citations for person-name queries

Proceedings of the 6th annual ACM international workshop on Web information and data management
World wide web site summarization

Web Intelligence and Agent Systems
Narrative text classification for automatic key phrase extraction in web document corpora

Proceedings of the 7th annual ACM international workshop on Web information and data management
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Electronic Word of Mouth: A Genre Analysis of Product Reviews on Consumer Opinion Web Sites

HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 03
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Multi-document statistical fact extraction and fusion

Multi-document statistical fact extraction and fusion
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
A few chirps about twitter

Proceedings of the first workshop on Online social networks
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Word Sense Disambiguation: Algorithms and Applications

Word Sense Disambiguation: Algorithms and Applications
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Twitter power: Tweets as electronic word of mouth

Journal of the American Society for Information Science and Technology
The impact of query refinement in the web people search task

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter

HICSS '10 Proceedings of the 2010 43rd Hawaii International Conference on System Sciences
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Earthquake shakes Twitter users: real-time event detection by social sensors

Proceedings of the 19th international conference on World wide web
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Automatic keyphrase extraction via topic decomposition

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
You are where you tweet: a content-based approach to geo-locating twitter users

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Discovering users' topics of interest on twitter: a first look

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Tokenizing micro-blogging messages using a text classification approach

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Entity disambiguation for knowledge base population

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Modeling Information Diffusion in Implicit Networks

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Patterns of temporal variation in online media

Proceedings of the fourth ACM international conference on Web search and data mining
Linking online news and social media

Proceedings of the fourth ACM international conference on Web search and data mining
Who says what to whom on twitter

Proceedings of the 20th international conference on World wide web
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Knowledge base population: successful approaches and challenges

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Filter keywords and majority class strategies for company name disambiguation in twitter

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Adding semantics to microblog posts

Proceedings of the fifth ACM international conference on Web search and data mining
Two stages based organization name disambiguity

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
SENSEVAL-2: overview

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Quantified Score

Hi-index	12.05

Visualization

Abstract

A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company. We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart. Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection.