Active associative sampling for author name disambiguation

  • Authors:
  • Anderson A. Ferreira;Rodrigo Silva;Marcos André Gonçalves;Adriano Veloso;Alberto H.F. Laender

  • Affiliations:
  • UFOP, Ouro Preto, Brazil;UFMG, Belo Horizonte, Brazil;UFMG, Belo Horizonte, Brazil;UFMG, Belo Horizonte, Brazil;UFMG, Belo Horizonte, Brazil

  • Venue:
  • Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the hardest problems faced by current scholarly digital libraries is author name ambiguity. This problem occurs when, in a set of citation records, there are records of a same author under distinct names, or citation records belonging to distinct authors with similar names. Among the several proposed methods, the most effective ones seem to be based on the direct assignment of the records to their respective authors by means of the application of supervised machine learning techniques. The effectiveness of such methods is usually directly correlated with the amount of supervised training data available. However, the acquisition of training examples requires skilled human annotators to manually label references. Aiming to reduce the set of examples needed to produce the training data, in this paper we propose a new active sampling strategy based on association rules for the author name disambiguation task. We compare our strategy with state-of-the-art supervised baselines that use the complete labeled training dataset and other active methods and show that very competitive results in terms of disambiguation effectiveness can be obtained with reductions in the training set of up to 71%.