A language independent approach for named entity recognition in subject headings

  • Authors:
  • Nuno Freire;José Borbinha;Pável Calado

  • Affiliations:
  • Instituto Superior Técnico, Technical University of Lisbon, Lisboa, Portugal and The European Library, National Library of the Netherlands, The Hague, Netherlands;Instituto Superior Técnico, Technical University of Lisbon, Lisboa, Portugal;Instituto Superior Técnico, Technical University of Lisbon, Lisboa, Portugal

  • Venue:
  • TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Subject headings systems are tools for organization of knowledge that have been developed over the years by libraries. The SKOS Simple Knowledge Organization System has provided a practical way to represent subject headings systems using the Resource Description Framework, and several libraries have taken the initiative to make subject headings systems widely available as open linked data. Each individual subject heading describes a concept, however, in the majority of cases, one subject heading is actually a combination of several concepts, such as a topic bounded in geographical and temporal scopes. In these cases, the label of the concept actually carries several concepts which are not represented in structured form. Our work explores machine learning techniques to recognize the sub concepts represented in the labels of SKOS subject headings. This paper describes a language independent named entity recognition technique based on conditional random fields, a machine learning algorithm for sequence labelling. This technique was evaluated on a subset of the Library of Congress Subject Headings, where we measured the recognition of geographic concepts, topics, time periods and historical periods. Our technique achieved an overall F1 score of 0.98.