A Term-Based Driven Clustering Approach for Name Disambiguation

  • Authors:
  • Jia Zhu;Xiaofang Zhou;Gabriel Pui Fung

  • Affiliations:
  • School of ITEE, The University of Queensland, Australia;School of ITEE, The University of Queensland, Australia;School of ITEE, The University of Queensland, Australia

  • Venue:
  • APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Name disambiguation in databases is a non-trivial task because people's names are often not unique and usually only a limited information is associated with each name in the database. For example, in DBLP many authors share the same name, whereas we do not have any unique identifier to distinguish them. To make it worst, we may not always be able to access the full contents of the materials, unless we have joined those organizations (e.g. ACM) who publish them. As such, how to disambiguate different names with a very limited information is a very challenging task. In this paper, we focus ourselves on such situation. We propose a term-based driven clustering approach for solving it. Specifically, we first construct some term-based taxonomies to mimic the expert knowledge of the domain by linking the related terms that appear in there automatically. Each taxonomy is then transformed into a graph, and we group the entries that belong to the same author by using either of the two novel models, namely, graph-based similarity model and graph-based random walk model. The former model aims at computing the similarity among terms, whereas the later model aims at investigating how likely would a set of terms be transformed to another set of terms. Extensive experiments are conducted by using the entries in DBLP. The favorable results indicated that our proposed approach is highly effective.