Effective self-training author name disambiguation in scholarly digital libraries

Authors:
Anderson A. Ferreira;Adriano Veloso;Marcos André Gonçalves;Alberto H.F. Laender
Affiliations:
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Venue:
Proceedings of the 10th annual joint conference on Digital libraries
Year:
2010

Citing 27
Cited 13

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Support-Vector Networks

Machine Learning
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A probabilistic similarity metric for Medline records: A model for author name disambiguation: Research Articles

Journal of the American Society for Information Science and Technology
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Comparative study of name disambiguation problem using a scalable blocking-based framework

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A hierarchical naive Bayes mixture model for name disambiguation in author citations

Proceedings of the 2005 ACM symposium on Applied computing
An effective approach to entity resolution problem using quasi-clique and its application to digital libraries
Multi-evidence, multi-criteria, lazy associative document classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Lazy Associative Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Using a knowledge base to disambiguate personal name in web search results

Proceedings of the 2007 ACM symposium on Applied computing
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Approximate personal name-matching through finite-state graphs

Journal of the American Society for Information Science and Technology
Are your citations clean?

Communications of the ACM
Keeping a digital library clean: new solutions to old problems

Proceedings of the eighth ACM symposium on Document engineering
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)
Disambiguating authors in academic publications using random forests

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using web information for author name disambiguation

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Improving author coreference by resource-bounded information gathering from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Author name disambiguation

Annual Review of Information Science and Technology
Efficient name disambiguation for large-scale databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Resolving author name homonymy to improve resolution of structures in co-author networks

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Incorporating user feedback into name disambiguation of scientific cooperation network

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Did they notice? - a case-study on the community contribution to data quality in DBLP

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate

Proceedings of the 20th ACM international conference on Information and knowledge management
Authormagic: an approach to author disambiguation in large-scale digital libraries

Proceedings of the 20th ACM international conference on Information and knowledge management
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Resolving homonymy with correlation clustering in scholarly digital libraries

Proceedings of the 22nd international conference on World Wide Web companion
Name disambiguation in scientific cooperation network by exploiting user feedback

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. Supervised methods that exploit training examples in order to distinguish ambiguous author names are among the most effective solutions for the problem, but they require skilled human annotators in a laborious and continuous process of manually labeling citations in order to provide enough training examples. Thus, addressing the issues of (i) automatic acquisition of examples and (ii) highly effective disambiguation even when only few examples are available, are the need of the hour for such systems. In this paper, we propose a novel two-step disambiguation method, SAND (Self-training Associative Name Disambiguator), that deals with these two issues. The first step eliminates the need of any manual labeling effort by automatically acquiring examples using a clustering method that groups citation records based on the similarity among coauthor names. The second step uses a supervised disambiguation method that is able to detect unseen authors not included in any of the given training examples. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation (i.e., author names, work title and publication venue), demonstrated that our proposed method outperforms representative unsupervised disambiguation methods that exploit similarities between citation records and is as effective as, and in some cases superior to, supervised ones, without manually labeling any training example.