A hierarchical naive Bayes mixture model for name disambiguation in author citations

  • Authors:
  • Hui Han;Wei Xu;Hongyuan Zha;C. Lee Giles

  • Affiliations:
  • Yahoo Inc., Sunnyvale, CA;NEC Laboratories America, Inc., Cupertino, CA;The Pennsylvania State University, PA;The Pennsylvania State University, PA

  • Venue:
  • Proceedings of the 2005 ACM symposium on Applied computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper presents a hierarchical naive Bayes mixture model, an unsupervised learning approach, for name disambiguation in author citations. This method partitions a collection of citations1 into clusters, with each cluster containing only citations authored by the same author, thus disambiguating authorship in citations to induce author name identities. Three types of citation features are used: co-author names, paper title words, and journal or proceeding title words. The approach is illustrated with 16 name datasets that are constructed based on the publication lists collected from author homepages and DBLP computer science bibliography.