Clustering technique in multi-document personal name disambiguation

  • Authors:
  • Chen Chen;Hu Junfeng;Wang Houfeng

  • Affiliations:
  • Key Laboratory of Computational Linguistics (Peking University), China;Key Laboratory of Computational Linguistics (Peking University), China;Key Laboratory of Computational Linguistics (Peking University), China

  • Venue:
  • ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of point-wise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that, we apply a labeling method to find representative feature for each cluster. Finally, experiments are conducted on word-based clustering in Chinese dataset and the result shows a good effect.