Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Authors:
Inderjit S. Dhillion
Affiliations:
-
Venue:
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Year:
2001

Citing 0
Cited 9

Document clustering via adaptive subspace iteration

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A method of cluster-based indexing of textual data

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Two-dimensional clustering for text categorization

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
Multinomial mixture model with feature selection for text clustering

Knowledge-Based Systems
A Clustering Framework Based on Adaptive Space Mapping and Rescaling

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Automatic taxonomy generation: issues and possibilities

IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Compositional matrix-space models for sentiment analysis

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A fast and effective partitioning algorithm for document clustering

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Both document clustering and word clustering are important and well-studied problems. By using the vector space model, a document collection may be represented as a word-document matrix. In this paper, we present the novel idea of modeling the document collection as a bipartite graph between documents and words. Using this model, we pose the clustering problem as a graph partitioning problein and give a new spectral algorithm that simultaneously yields a clustering of documents and words. This co-clustering algorithm uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. In fact, it can be shown that these singular vectors give a real relaxation to the optimal solution of the graph bipartitioning problem. We present several experimental results to verify that the resulting co-clustering algorithm works well in practice and is robust in the presence of noise.