Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization

Authors:
Yanhua Chen;Lijun Wang;Ming Dong
Affiliations:
Machine Vision and Pattern Recognition Lab Department of Computer Science, Wayne State University, Detroit, USA 48202;Machine Vision and Pattern Recognition Lab Department of Computer Science, Wayne State University, Detroit, USA 48202;Machine Vision and Pattern Recognition Lab Department of Computer Science, Wayne State University, Detroit, USA 48202
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Year:
2009

Citing 19
Cited 3

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Centroid-Based Document Classification: Analysis and Experimental Results

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning

IEEE Transactions on Knowledge and Data Engineering
Co-clustering by block value decomposition

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Semi-supervised graph clustering: a kernel approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Spectral clustering for multi-type relational data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Document clustering with prior knowledge

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Star-Structured High-Order Heterogeneous Data Co-clustering Based on Consistent Information Theory

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Co-clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Deriving semantics for image clustering from accumulated user feedbacks

Proceedings of the 15th international conference on Multimedia
Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering

Proceedings of the 17th international conference on World Wide Web
Incorporating User Provided Constraints into Document Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems

Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Parameter-less co-clustering for star-structured heterogeneous data

Data Mining and Knowledge Discovery
Semi-supervised clustering via constrained symmetric non-negative matrix factorization

BI'12 Proceedings of the 2012 international conference on Brain Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to derive high quality information from text, the field of text mining has advanced swiftly from simple document clustering to co-clustering with words and categories. However, document co-clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose a Semi-Supervised Non-negative Matrix Factorization (SS-NMF) framework for document co-clustering. Our method computes new word-document and document-category matrices by incorporating user provided constraints through simultaneous distance metric learning and modality selection. Using an iterative algorithm, we perform tri-factorization of the new matrices to infer the document, category and word clusters. Theoretically, we show the convergence and correctness of SS-NMF co-clustering and the advantages of SS-NMF co-clustering over existing approaches. Through extensive experiments conducted on publicly available data sets, we demonstrate the superior performance of SS-NMF for document co-clustering.