Short text clustering by finding core terms

Authors:
Xingliang Ni;Xiaojun Quan;Zhi Lu;Liu Wenyin;Bei Hua
Affiliations:
University of Science and Technology of China, School of Computer Science and Technology, Hefei, China and City University of Hong Kong, Department of Computer Science, HKSAR, China and CityU-USTC ...;City University of Hong Kong, Department of Computer Science, HKSAR, China;City University of Hong Kong, Department of Computer Science, HKSAR, China;University of Science and Technology of China, School of Computer Science and Technology, Hefei, China and City University of Hong Kong, Department of Computer Science, HKSAR, China and CityU-USTC ...;University of Science and Technology of China, School of Computer Science and Technology, Hefei, China and CityU-USTC Advanced Research Institute, Joint Research Lab of Excellence, Suzhou, China
Venue:
Knowledge and Information Systems
Year:
2011

Citing 0
Cited 1

Extended information inference model for unsupervised categorization of web short texts

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new clustering strategy, TermCut, is presented to cluster short text snippets by finding core terms in the corpus. We model the collection of short text snippets as a graph in which each vertex represents a piece of short text snippet and each weighted edge between two vertices measures the relationship between the two vertices. TermCut is then applied to recursively select a core term and bisect the graph such that the short text snippets in one part of the graph contain the term, whereas those snippets in the other part do not. We apply the proposed method on different types of short text snippets, including questions and search results. Experimental results show that the proposed method outperforms state-of-the-art clustering algorithms for clustering short text snippets.