Document clustering by concept factorization

Authors:
Wei Xu;Yihong Gong
Affiliations:
NEC Laboratories America, Inc., Cupertino, CA;NEC Laboratories America, Inc., Cupertino, CA
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 11
Cited 32

Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval

Proceedings of the ninth international conference on Information and knowledge management
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
Document clustering with cluster refinement and model selection capabilities

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Concept Decompositions for Large Sparse Text Data Using Clustering

Machine Learning
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multilevel spectral hypergraph partitioning with arbitrary vertex sizes

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

A general model for clustering binary data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Document Clustering Using Locality Preserving Indexing

IEEE Transactions on Knowledge and Data Engineering
A Unified View on Clustering Binary Data

Machine Learning
A partitioning based algorithm to fuzzy co-cluster documents and words

Pattern Recognition Letters
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering

Information Processing and Management: an International Journal
Structural and temporal analysis of the blogosphere through community factorization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Possibilistic fuzzy co-clustering of large document collections

Pattern Recognition
Biomedical ontology improves biomedical literature clustering performance: a comparison study

International Journal of Bioinformatics Research and Applications
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

Integrated Computer-Aided Engineering
Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Learning Bidirectional Similarity for Collaborative Filtering

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Clustering based on matrix approximation: a unifying view

Knowledge and Information Systems
Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering

Expert Systems with Applications: An International Journal
Detect and track latent factors with online nonnegative matrix factorization

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Document Clustering with Cluster Refinement and Non-negative Matrix Factorization

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Mining fuzzy frequent itemsets for hierarchical document clustering

Information Processing and Management: an International Journal
Document clustering using NMF and fuzzy relation

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Learning bidirectional asymmetric similarity for collaborative filtering via matrix factorization

Data Mining and Knowledge Discovery
Integrating Document Clustering and Multidocument Summarization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Discriminative concept factorization for data representation

Neurocomputing
Representing document as dependency graph for document clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Improving quality of search results clustering with approximate matrix factorisations

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Locality-constrained concept factorization

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Clustering and understanding documents via discrimination information maximization

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Using maximal spanning trees and word similarity to generate hierarchical clusters of non-redundant RSS news articles

Journal of Intelligent Information Systems
Feature selection for unsupervised learning

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
On-line learning parts-based representation via incremental orthogonal projective non-negative matrix factorization

Signal Processing
Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation

Expert Systems with Applications: An International Journal
Pairwise constrained concept factorization for data representation

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering task is accomplished by computing the two sets of linear coefficients, and this linear coefficients computation is carried out by finding the non-negative solution that minimizes the reconstruction error of the data points. The cluster label of each data point can be easily derived from the obtained linear coefficients. This method differs from the method of clustering based on non-negative matrix factorization (NMF) \citeXu03 in that it can be applied to data containing negative values and the method can be implemented in the kernel space. Our experimental results show that the proposed data clustering method and its variations performs best among 11 algorithms and their variations that we have evaluated on both TDT2 and Reuters-21578 corpus. In addition to its good performance, the new method also has the merit in its easy and reliable derivation of the clustering results.