Active associative sampling for author name disambiguation

Authors:
Anderson A. Ferreira;Rodrigo Silva;Marcos André Gonçalves;Adriano Veloso;Alberto H.F. Laender
Affiliations:
UFOP, Ouro Preto, Brazil;UFMG, Belo Horizonte, Brazil;UFMG, Belo Horizonte, Brazil;UFMG, Belo Horizonte, Brazil;UFMG, Belo Horizonte, Brazil
Venue:
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Year:
2012

Citing 38
Cited 2

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Machine Learning

Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A probabilistic similarity metric for Medline records: A model for author name disambiguation: Research Articles

Journal of the American Society for Information Science and Technology
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A hierarchical naive Bayes mixture model for name disambiguation in author citations

Proceedings of the 2005 ACM symposium on Applied computing
SVM selective sampling for ranking with application to data retrieval

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Multi-evidence, multi-criteria, lazy associative document classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Improving Grouped-Entity Resolution Using Quasi-Cliques

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Lazy Associative Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Are your citations clean?

Communications of the ACM
Optimizing estimated loss reduction for active sampling in rank learning

Proceedings of the 25th international conference on Machine learning
Keeping a digital library clean: new solutions to old problems

Proceedings of the eighth ACM symposium on Document engineering
Author Name Disambiguation for Citations Using Topic and Web Correlation

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A Latent Topic Model for Complete Entity Resolution

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)
Disambiguating authors in academic publications using random forests

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using web information for author name disambiguation

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Improving author coreference by resource-bounded information gathering from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Effective self-training author name disambiguation in scholarly digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations

Journal of the American Society for Information Science and Technology
On Graph-Based Name Disambiguation

Journal of Data and Information Quality (JDIQ)
Construction of a large-scale test set for author disambiguation

Information Processing and Management: an International Journal
Rule-based active sampling for learning to rank

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Efficient name disambiguation for large-scale databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
ADANA: Active Name Disambiguation

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
A Unified Probabilistic Framework for Name Disambiguation in Digital Library

IEEE Transactions on Knowledge and Data Engineering
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology

A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Bootstrapping active name disambiguation with crowdsourcing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the hardest problems faced by current scholarly digital libraries is author name ambiguity. This problem occurs when, in a set of citation records, there are records of a same author under distinct names, or citation records belonging to distinct authors with similar names. Among the several proposed methods, the most effective ones seem to be based on the direct assignment of the records to their respective authors by means of the application of supervised machine learning techniques. The effectiveness of such methods is usually directly correlated with the amount of supervised training data available. However, the acquisition of training examples requires skilled human annotators to manually label references. Aiming to reduce the set of examples needed to produce the training data, in this paper we propose a new active sampling strategy based on association rules for the author name disambiguation task. We compare our strategy with state-of-the-art supervised baselines that use the complete labeled training dataset and other active methods and show that very competitive results in terms of disambiguation effectiveness can be obtained with reductions in the training set of up to 71%.