Using web information for author name disambiguation

  • Authors:
  • Denilson Alves Pereira;Berthier Ribeiro-Neto;Nivio Ziviani;Alberto H.F. Laender;Marcos André Gonçalves;Anderson A. Ferreira

  • Affiliations:
  • Federal University of Minas Gerais, Belo Horizonte, Brazil;Google Engineering, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil

  • Venue:
  • Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We proposed here a new method that uses information available on the Web to deal with both problems at the same time. Our idea consists of gathering information from input citations and submitting queries to a Web search engine, aiming at finding curricula vitae and Web pages containing publications of the ambiguous authors. From the content of documents in the answer sets returned by the Web search engine, useful information that can help in the disambiguation process is extracted. Using this information, author names are disambiguated by leveraging a hierarchical clustering method that groups citations in the same document together in a bottom-up fashion. Experimental results show that the our method yields results that outperform those of two state-of-the-art unsupervised methods and are statistically comparable with those of a supervised one, but requiring no training. We observe gains of up to 65.2% in the pairwise F1 metric when compared with our best unsupervised baseline method.