An experience developing a semantic annotation system in a media group

Authors:
Angel L. Garrido;Oscar Gómez;Sergio Ilarri;Eduardo Mena
Affiliations:
Grupo Heraldo - Grupo La Información, Pamplona, Zaragoza, Spain;Grupo Heraldo - Grupo La Información, Pamplona, Zaragoza, Spain;IIS Department, University of Zaragoza, Zaragoza, Spain;IIS Department, University of Zaragoza, Zaragoza, Spain
Venue:
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Year:
2012

Citing 6
Cited 0

Support-Vector Networks

Machine Learning
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Ontology-based information extraction: An introduction and a survey of current approaches

Journal of Information Science
NASS: News Annotation Semantic System

ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Personalized News Filtering and Summarization on the Web

ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media. In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.