An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

Authors:
Ricardo G. Cota;Anderson A. Ferreira;Cristiano Nascimento;Marcos André Gonçalves;Alberto H. F. Laender
Affiliations:
Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP 31270-010, Belo Horizonte – MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP 31270-010, Belo Horizonte – MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP 31270-010, Belo Horizonte – MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP 31270-010, Belo Horizonte – MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP 31270-010, Belo Horizonte – MG, Brazil
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 0
Cited 12

Incorporating user feedback into name disambiguation of scientific cooperation network

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Authormagic: an approach to author disambiguation in large-scale digital libraries

Proceedings of the 20th ACM international conference on Information and knowledge management
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
Author name disambiguation: What difference does it make in author-based citation analysis?

Journal of the American Society for Information Science and Technology
Author name disambiguation using a new categorical distribution similarity

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
An automatic system for identifying authorities in digital libraries

Expert Systems with Applications: An International Journal
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Aggregating productivity indices for ranking researchers across multiple areas

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in scientific cooperation network by exploiting user feedback

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion. In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature. We present comparisons of results using each considered attribute separately (i.e., coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together. These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours. Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters. © 2010 Wiley Periodicals, Inc.