SyGAR: a synthetic data generator for evaluating name disambiguation methods

  • Authors:
  • Anderson A. Ferreira;Marcos André Gonçalves;Jussara M. Almeida;Alberto H. F. Laender;Adriano Veloso

  • Affiliations:
  • Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil

  • Venue:
  • ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Name ambiguity in the context of bibliographic citations is one of the hardest problems currently faced by the digital library community. Several methods have been proposed in the literature, but none of them provides the perfect solution for the problem. More importantly, basically all of these methods were tested in limited and restricted scenarios, which raises concerns about their practical applicability. In this work, we deal with these limitations by proposing a synthetic generator of ambiguous authorship records called SyGAR. The generator was validated against a gold standard collection of disambiguated records, and applied to evaluate three disambiguation methods in a relevant scenario.