PhraseRank for document clustering: reweighting the weight of phrase

  • Authors:
  • Yoon-Ho Cho;Sang-Hyun Park;SangKeun Lee

  • Affiliations:
  • Korea University, Seoul, Republic of Korea;Korea University, Seoul, Republic of Korea;Korea University, Seoul, Republic of Korea

  • Venue:
  • Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a document collection, a hierarchical clustering algorithm groups several clusters. Recent works have identified the set of overlap phrases as useful features in hierarchical document clustering. However, they did not consider the relationship between co-occurred overlap phrases in a document and degrees of opposite relationships between overlap phrases. In this paper, we propose new algorithms for effective similarity measure before working hierarchical clustering algorithm. There are two important features in the proposed methods: the ranking list of top-k phrases for each particular overlap phrase and the opposite significances between two overlap phrases with each other. Experiment result shows that proposed method improves the results of clustering.