Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Ontology-based information extraction: An introduction and a survey of current approaches
Journal of Information Science
NASS: News Annotation Semantic System
ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Personalized News Filtering and Summarization on the Web
ICTAI '11 Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Hi-index | 0.00 |
Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media. In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.