Clustering short text using Ncut-weighted non-negative matrix factorization

  • Authors:
  • Xiaohui Yan;Jiafeng Guo;Shenghua Liu;Xue-qi Cheng;Yanfeng Wang

  • Affiliations:
  • Institute of Computing Technology, CAS, Beijing, China;Institute of Computing Technology, CAS, Beijing, China;Institute of Computing Technology, CAS, Beijing, China;Institute of Computing Technology, CAS, Beijing, China;Sogou Inc., Beijing, China

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Non-negative matrix factorization (NMF) has been successfully applied in document clustering. However, experiments on short texts, such as microblogs, Q&A documents and news titles, suggest unsatisfactory performance of NMF. An major reason is that the traditional term weighting schemes, like binary weight and tfidf, cannot well capture the terms' discriminative power and importance in short texts, due to the sparsity of data. To tackle this problem, we proposed a novel term weighting scheme for NMF, derived from the Normalized Cut (Ncut) problem on the term affinity graph. Different from idf, which emphasizes discriminability on document level, the Ncut weighting measures terms' discriminability on term level. Experiments on two data sets show our weighting scheme significantly boosts NMF's performance on short text clustering.