An unsupervised transfer learning approach to discover topics for online reputation management

Authors:
Tamara Martín-Wanton;Julio Gonzalo;Enrique Amigó
Affiliations:
UNED, Madrid, Spain;UNED, Madrid, Spain;UNED, Madrid, Spain
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 13
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Extending WHIRL with background knowledge for improved text classification

Information Retrieval
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Self-taught clustering

Proceedings of the 25th international conference on Machine learning
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
TwitterRank: finding topic-sensitive influential twitterers

Proceedings of the third ACM international conference on Web search and data mining
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Short text classification improved by learning multi-granularity topics

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A general evaluation measure for document organization tasks

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled "background" tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.