The Journal of Machine Learning Research
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Extending WHIRL with background knowledge for improved text classification
Information Retrieval
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 25th international conference on Machine learning
Topic-bridged PLSA for cross-domain text classification
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
Similarity measures for short segments of text
ECIR'07 Proceedings of the 29th European conference on IR research
IEEE Transactions on Knowledge and Data Engineering
Comparing twitter and traditional media using topic models
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Short text classification improved by learning multi-granularity topics
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A general evaluation measure for document organization tasks
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled "background" tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.