Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization

  • Authors:
  • Yanhua Chen;Lijun Wang;Ming Dong

  • Affiliations:
  • Machine Vision and Pattern Recognition Lab Department of Computer Science, Wayne State University, Detroit, USA 48202;Machine Vision and Pattern Recognition Lab Department of Computer Science, Wayne State University, Detroit, USA 48202;Machine Vision and Pattern Recognition Lab Department of Computer Science, Wayne State University, Detroit, USA 48202

  • Venue:
  • ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to derive high quality information from text, the field of text mining has advanced swiftly from simple document clustering to co-clustering with words and categories. However, document co-clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose a Semi-Supervised Non-negative Matrix Factorization (SS-NMF) framework for document co-clustering. Our method computes new word-document and document-category matrices by incorporating user provided constraints through simultaneous distance metric learning and modality selection. Using an iterative algorithm, we perform tri-factorization of the new matrices to infer the document, category and word clusters. Theoretically, we show the convergence and correctness of SS-NMF co-clustering and the advantages of SS-NMF co-clustering over existing approaches. Through extensive experiments conducted on publicly available data sets, we demonstrate the superior performance of SS-NMF for document co-clustering.