On Graph-Based Name Disambiguation

  • Authors:
  • Xiaoming Fan;Jianyong Wang;Xu Pu;Lizhu Zhou;Bing Lv

  • Affiliations:
  • Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University

  • Venue:
  • Journal of Data and Information Quality (JDIQ)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Name ambiguity stems from the fact that many people or objects share identical names in the real world. Such name ambiguity decreases the performance of document retrieval, Web search, information integration, and may cause confusion in other applications. Due to the same name spellings and lack of information, it is a nontrivial task to distinguish them accurately. In this article, we focus on investigating the problem in digital libraries to distinguish publications written by authors with identical names. We present an effective framework named GHOST (abbreviation for GrapHical framewOrk for name diSambiguaTion), to solve the problem systematically. We devise a novel similarity metric, and utilize only one type of attribute (i.e., coauthorship) in GHOST. Given the similarity matrix, intermediate results are grouped into clusters with a recently introduced powerful clustering algorithm called Affinity Propagation. In addition, as a complementary technique, user feedback can be used to enhance the performance. We evaluated the framework on the real DBLP and PubMed datasets, and the experimental results show that GHOST can achieve both high precision and recall.