A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Constructing informative priors using transfer learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Self-taught learning: transfer learning from unlabeled data
Proceedings of the 24th international conference on Machine learning
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 25th international conference on Machine learning
Topic-bridged PLSA for cross-domain text classification
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Web Search Clustering and Labeling with Hidden Topics
ACM Transactions on Asian Language Information Processing (TALIP)
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Improving similarity measures for short segments of text
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Knowledge and Data Engineering
Short text similarity based on probabilistic topics
Knowledge and Information Systems
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
A Hidden Topic-Based Framework toward Building Applications with Short Web Documents
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
PolyUCOMP: combining semantic vectors with skip bigrams for semantic textual similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Modeling sentences in the latent space
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Topic-driven reader comments summarization
Proceedings of the 21st ACM international conference on Information and knowledge management
TCSST: transfer classification of short & sparse text using external data
Proceedings of the 21st ACM international conference on Information and knowledge management
Visualizing streaming text data with dynamic graphs and maps
GD'12 Proceedings of the 20th international conference on Graph Drawing
NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Personalized time-aware tweets summarization
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A biterm topic model for short texts
Proceedings of the 22nd international conference on World Wide Web
Location-specific tweet detection and topic summarization in Twitter
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Dynamic multi-faceted topic discovery in twitter
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Short text classification by detecting information path
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Domain adaptation with topical correspondence learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understanding short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when mining the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous work exists that enhance short text clustering with related long texts, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and hurt the clustering performance. To accommodate the possible inconsistency between source and target data, we propose a novel topic model - Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistency between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.