Text clustering using one-mode projection of document-word bipartite graphs

  • Authors:
  • Ajitesh Srivastava;Axel J. Soto;Evangelos Milios

  • Affiliations:
  • Birla Institute of Technology and Science, Pilani, India;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many real life networks have an underlying bipartite structure based on which similarity between two nodes or data instances can be defined. For example, in the case of a document corpus, the similarity between a pair of documents can be assumed to arise from the words that co-occur in them, and this document-word co-occurrence relationship can be modeled as a bipartite graph. A document similarity graph can be obtained by taking a one-mode projection of the bipartite graph, which is a popular technique for studying similar networks which arise from bipartite structure. A graph-based clustering algorithm can then be applied to this projection graph to obtain clusters of documents. In this paper we study the use of one-mode projection of the document-word bipartite graph and the subsequent application of a modularity optimization algorithm to cluster the documents. In particular, we propose an alternative and faster algorithm, which works in two-steps: first, finding the documents that are easy to cluster, and then, assigning the remaining documents to the existing or new clusters. We show that the algorithms based on one-mode projections perform significantly better than traditional clustering approaches. In addition, our method has similar or better clustering performance than the most popular algorithm for modularity optimization, while also running four times faster.