On the difficulty of clustering company tweets

Authors:
Fernando Perez-Tellez;David Pinto;John Cardiff;Paolo Rosso
Affiliations:
Institute of Technology Tallaght Dublin, Dublin, Ireland;Benemérita Universidad Autónoma de Puebla, Puebla, Mexico;Institute of Technology Tallaght Dublin, Dublin, Ireland;Universidad Politécnica de Valencia, Valenci, Spain
Venue:
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Year:
2010

Citing 10
Cited 4

Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A vector space model for automatic indexing

Communications of the ACM
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Information Retrieval

Information Retrieval
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
TwitterStand: news in tweets

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

A document is known by the company it keeps: neighborhood consensus for short text categorization

Language Resources and Evaluation
Exploiting hashtags for adaptive microblog crawling

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Followee recommendation based on text analysis of micro-blogging activity

Information Systems
Mining topic clouds from social data

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Twitter is a new successful technology of the Web 2.0 genre which is used by millions of people and companies to publish brief messages ("tweets") with the purpose of sharing experiences and/or opinions about a product or service. Due to the huge amount of information available in this type of technology, there is a clear need for new systems that can mine these messages in order to derive information about the collective thinking of twitterers (e.g. for opinion or sentiment analysis). Tweet analysis is a very important task because comments, opinions, suggestions, complaints can be used as marketing strategies or for determining information on a company's reputation. For this purpose, it is necessary to establish whether a tweet refers to a company or not, which is not a straightforward keyword search process as there may be multiple contexts in which a name can be used. The aim of this work is to present and compare a number of different approaches based on clustering that determine whether a given tweet refers to a particular company or not. For this purpose, we have used an enriching methodology in order to improve the representation of tweets and as a consequence the performance of the clustering company tweets task. The obtained results are promising and highlight the difficulty of this task.