Weighted Feature Subset Non-negative Matrix Factorization and Its Applications to Document Understanding

  • Authors:
  • Dingding Wang;Tao Li;Chris Ding

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keyword (Feature) selection enhances and improves many Information Retrieval (IR) tasks such as document categorization, automatic topic discovery, etc. The problem of keyword selection is usually solved using supervised algorithms. In this paper, we propose an unsupervised approach that combines keyword selection and document clustering (topic discovery) together. The proposed approach extends non-negative matrix factorization (NMF) by incorporating a weight matrix to indicate the importance of the keywords. The proposed approach is further extended to a weighted version in which each document is also assigned a weight to assess its importance in the cluster. This work considers both theoretical and empirical weighted feature subset selection for NMF and draws the connection between unsupervised feature selection and data clustering. We apply our proposed approaches to various document understanding tasks including document clustering, summarization, and visualization. Experimental results demonstrate the effectiveness of our approach for these tasks.