FLUX-CIM: flexible unsupervised extraction of citation metadata

  • Authors:
  • Eli Cortez;Altigran S. da Silva;Marcos André Gonçalves;Filipe Mesquita;Edleno S. de Moura

  • Affiliations:
  • Universidade Federal do Amazonas, Manaus, Brazil;Universidade Federal do Amazonas, Manaus, Brazil;Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Universidade Federal do Amazonas, Manaus, Brazil;Universidade Federal do Amazonas, Manaus, Brazil

  • Venue:
  • Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing the components of a citation, in our case, such a KB is automatically constructed from an existing set of sample metadata records from a given area (e.g., computer science or health sciences). Our approach does not rely on patterns encoding specific delimitators of a particular citation style. It is also unsupervised, in the sense that it does not rely on a learning method that requires a training phase. These features assign to our technique a high degree of automation and flexibility. To demonstrate the effectiveness and applicability of our proposed approach we have run experiments in which we applied it to extract information from citations in papers of two different domains. Results of these experiments indicate precision and recall levels above 94% and perfect extraction for the large majority of citations tested.