Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

  • Authors:
  • Illhoi Yoo;Xiaohua Hu;Il-Yeol Song

  • Affiliations:
  • University of Missouri-Columbia, Columbia, MO;Drexel University, Philadelphia, PA;Drexel University, Philadelphia, PA

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a novel document clustering approach that overcomes those problems by combining a semantic-based bipartite graph representation and a mutual refinement strategy. The primary contributions of this paper are the following. First, we introduce a new representation of documents using a bipartite graph between documents and co-occurrence concepts in the documents. Second, we show how to enhance clustering quality by applying the mutual refinement strategy to the initial clustering results. Third, through the experiments on MEDLINE documents, we show that our integrated method significantly enhances cluster quality and clustering reliability compared to existing clustering methods. Our approach improves on the average 29.5 cluster quality and 26.3 clustering reliability, in terms of misclassification index, over Bisecting K-means with the best parameters.