Document clustering via adaptive subspace iteration
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A method of cluster-based indexing of textual data
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Two-dimensional clustering for text categorization
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Algorithms for clustering high dimensional and distributed data
Intelligent Data Analysis
Multinomial mixture model with feature selection for text clustering
Knowledge-Based Systems
A Clustering Framework Based on Adaptive Space Mapping and Rescaling
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Automatic taxonomy generation: issues and possibilities
IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Compositional matrix-space models for sentiment analysis
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A fast and effective partitioning algorithm for document clustering
ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Hi-index | 0.00 |
Both document clustering and word clustering are important and well-studied problems. By using the vector space model, a document collection may be represented as a word-document matrix. In this paper, we present the novel idea of modeling the document collection as a bipartite graph between documents and words. Using this model, we pose the clustering problem as a graph partitioning problein and give a new spectral algorithm that simultaneously yields a clustering of documents and words. This co-clustering algorithm uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. In fact, it can be shown that these singular vectors give a real relaxation to the optimal solution of the graph bipartitioning problem. We present several experimental results to verify that the resulting co-clustering algorithm works well in practice and is robust in the presence of noise.