A constraint-based probabilistic framework for name disambiguation

  • Authors:
  • Duo Zhang;Jie Tang;Juanzi Li;Kehong Wang

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is concerned with the problem of name disambiguation. By name disambiguation, we mean distinguishing persons with the same name. It is a critical problem in many knowledge management applications. Despite much research work has been conducted, the problem is still not resolved and becomes even more serious, in particular with the popularity of Web 2.0. Previously, name disambiguation was often undertaken in either a supervised or unsupervised fashion. This paper first gives a constraint-based probabilistic model for semi-supervised name disambiguation. Specifically, we focus on investigating the problem in an academic researcher social network (http://arnetminer.org). The framework combines constraints and Euclidean distance learning, and allows the user to refine the disambiguation results. Experimental results on the researcher social network show that the proposed framework significantly outperforms the baseline method using unsupervised hierarchical clustering algorithm.