ACM Computing Surveys (CSUR)
Bipartite graph partitioning and data clustering
Proceedings of the tenth international conference on Information and knowledge management
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite structure of all complex networks
Information Processing Letters
Introduction to Information Retrieval
Introduction to Information Retrieval
Potential collaboration discovery using document clustering and community structure detection
Proceedings of the 1st ACM international workshop on Complex networks meet information & knowledge management
Information dissemination dynamics in delay tolerant network: a bipartite network approach
Proceedings of the third ACM international workshop on Mobile Opportunistic Networks
Computer Science Review
Hi-index | 0.00 |
Many real life networks have an underlying bipartite structure based on which similarity between two nodes or data instances can be defined. For example, in the case of a document corpus, the similarity between a pair of documents can be assumed to arise from the words that co-occur in them, and this document-word co-occurrence relationship can be modeled as a bipartite graph. A document similarity graph can be obtained by taking a one-mode projection of the bipartite graph, which is a popular technique for studying similar networks which arise from bipartite structure. A graph-based clustering algorithm can then be applied to this projection graph to obtain clusters of documents. In this paper we study the use of one-mode projection of the document-word bipartite graph and the subsequent application of a modularity optimization algorithm to cluster the documents. In particular, we propose an alternative and faster algorithm, which works in two-steps: first, finding the documents that are easy to cluster, and then, assigning the remaining documents to the existing or new clusters. We show that the algorithms based on one-mode projections perform significantly better than traditional clustering approaches. In addition, our method has similar or better clustering performance than the most popular algorithm for modularity optimization, while also running four times faster.