Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais

  • Authors:
  • Eduardo N. Borges;Renata M. Galante;Marcos A. Gonçalves

  • Affiliations:
  • Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre -- RS -- Brasil;Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre -- RS -- Brasil;Universidade Federal de Minas Gerais (UFMG), Belo Horizonte -- MG -- Brasil

  • Venue:
  • SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital libraries contain collections of digital objects, acquired from different sources, which can be represented through several metadata standards. These metadata are heterogeneous both in content and in structure. This paper presents an approach that identifies duplicated metadata records referring to objects from digital libraries. We propose similarity functions designed for the digital library domain that compare the content of metadata. The results of experiments show that the proposed functions, compared to three different baselines, improve the quality of metadata deduplication from 0.64 to 31.5% using an algorithm with linear complexity to compare authors' names.