Short text clustering by finding core terms

  • Authors:
  • Xingliang Ni;Xiaojun Quan;Zhi Lu;Liu Wenyin;Bei Hua

  • Affiliations:
  • University of Science and Technology of China, School of Computer Science and Technology, Hefei, China and City University of Hong Kong, Department of Computer Science, HKSAR, China and CityU-USTC ...;City University of Hong Kong, Department of Computer Science, HKSAR, China;City University of Hong Kong, Department of Computer Science, HKSAR, China;University of Science and Technology of China, School of Computer Science and Technology, Hefei, China and City University of Hong Kong, Department of Computer Science, HKSAR, China and CityU-USTC ...;University of Science and Technology of China, School of Computer Science and Technology, Hefei, China and CityU-USTC Advanced Research Institute, Joint Research Lab of Excellence, Suzhou, China

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new clustering strategy, TermCut, is presented to cluster short text snippets by finding core terms in the corpus. We model the collection of short text snippets as a graph in which each vertex represents a piece of short text snippet and each weighted edge between two vertices measures the relationship between the two vertices. TermCut is then applied to recursively select a core term and bisect the graph such that the short text snippets in one part of the graph contain the term, whereas those snippets in the other part do not. We apply the proposed method on different types of short text snippets, including questions and search results. Experimental results show that the proposed method outperforms state-of-the-art clustering algorithms for clustering short text snippets.