Text clustering using one-mode projection of document-word bipartite graphs

Authors:
Ajitesh Srivastava;Axel J. Soto;Evangelos Milios
Affiliations:
Birla Institute of Technology and Science, Pilani, India;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada
Venue:
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Year:
2013

Citing 8
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite structure of all complex networks

Information Processing Letters
Introduction to Information Retrieval

Introduction to Information Retrieval
Potential collaboration discovery using document clustering and community structure detection

Proceedings of the 1st ACM international workshop on Complex networks meet information & knowledge management
Information dissemination dynamics in delay tolerant network: a bipartite network approach

Proceedings of the third ACM international workshop on Mobile Opportunistic Networks
Survey: Graph clustering

Computer Science Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real life networks have an underlying bipartite structure based on which similarity between two nodes or data instances can be defined. For example, in the case of a document corpus, the similarity between a pair of documents can be assumed to arise from the words that co-occur in them, and this document-word co-occurrence relationship can be modeled as a bipartite graph. A document similarity graph can be obtained by taking a one-mode projection of the bipartite graph, which is a popular technique for studying similar networks which arise from bipartite structure. A graph-based clustering algorithm can then be applied to this projection graph to obtain clusters of documents. In this paper we study the use of one-mode projection of the document-word bipartite graph and the subsequent application of a modularity optimization algorithm to cluster the documents. In particular, we propose an alternative and faster algorithm, which works in two-steps: first, finding the documents that are easy to cluster, and then, assigning the remaining documents to the existing or new clusters. We show that the algorithms based on one-mode projections perform significantly better than traditional clustering approaches. In addition, our method has similar or better clustering performance than the most popular algorithm for modularity optimization, while also running four times faster.