An experience developing a semantic annotation system in a media group

  • Authors:
  • Angel L. Garrido;Oscar Gómez;Sergio Ilarri;Eduardo Mena

  • Affiliations:
  • Grupo Heraldo - Grupo La Información, Pamplona, Zaragoza, Spain;Grupo Heraldo - Grupo La Información, Pamplona, Zaragoza, Spain;IIS Department, University of Zaragoza, Zaragoza, Spain;IIS Department, University of Zaragoza, Zaragoza, Spain

  • Venue:
  • NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media. In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.