Consolidation of References to Persons in Bibliographic Databases

  • Authors:
  • Nuno Freire;José Borbinha;Bruno Martins

  • Affiliations:
  • Instituto Superior Técnico, Technical University of Lisbon, Lisboa, Portugal 1049-001;Instituto Superior Técnico, Technical University of Lisbon, Lisboa, Portugal 1049-001;Instituto Superior Técnico, Technical University of Lisbon, Lisboa, Portugal 1049-001

  • Venue:
  • ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in bibliographic data, specifically in the case of consolidating multiple datasets. Or solution follows the extraction, transformation and loading (ETL) process, typical in data warehouses. It computes the similarities of the attribute values for the references, and employs a decision tree to decide when the references match. We describe the characteristics of these references within bibliographic datasets, and how we explored those characteristics by developing new similarity metrics to improve the quality of the consolidation process. We evaluated our work by designing an experiment with data from four national libraries. The results show that the proposed similarity metrics contribute significantly to the consolidation process.