Short text similarity based on probabilistic topics

Authors:
Xiaojun Quan;Gang Liu;Zhi Lu;Xingliang Ni;Liu Wenyin
Affiliations:
City University of Hong Kong, Department of Computer Science, Hong Kong SAR, China;City University of Hong Kong, Department of Computer Science, Hong Kong SAR, China;City University of Hong Kong, Department of Computer Science, Hong Kong SAR, China;University of Science and Technology of China, Department of Computer Science and Technology, Hefei, China and CityU-USTC Advanced Research Institute, Joint Research Lab of Excellence, Suzhou, Chi ...;City University of Hong Kong, Department of Computer Science, Hong Kong SAR, China and CityU-USTC Advanced Research Institute, Joint Research Lab of Excellence, Suzhou, China
Venue:
Knowledge and Information Systems
Year:
2010

Citing 0
Cited 7

Transferring topical knowledge from auxiliary long texts for short text clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Decision support for improved service effectiveness using domain aware text mining

Knowledge-Based Systems
Analyzing and mining a code search engine usage log

Empirical Software Engineering
Using semi-structured data for assessing research paper similarity

Information Sciences: an International Journal
Computing similarity between items in a digital library of cultural heritage

Journal on Computing and Cultural Heritage (JOCCH)
Extended information inference model for unsupervised categorization of web short texts

Journal of Information Science
Short text classification by detecting information path

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs sampling algorithm. The relationship between the distinguishing terms of the short text snippets can be discovered by examining their probabilities under each topic. The similarity between two short text snippets is calculated based on their common terms and the relationship of their distinguishing terms. Extensive experiments on paraphrasing and question categorization show that the proposed method can calculate the similarity of short text snippets more accurately than other methods including the pure TF-IDF measure.