Learning to tag text from rules and examples

Authors:
Michelangelo Diligenti;Marco Gori;Marco Maggini
Affiliations:
Dipartimento di Ingegneria dell'Informazione, Università di Siena, Italy;Dipartimento di Ingegneria dell'Informazione, Università di Siena, Italy;Dipartimento di Ingegneria dell'Informazione, Università di Siena, Italy
Venue:
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Year:
2011

Citing 7
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Tag ranking

Proceedings of the 18th international conference on World wide web
Ontology based Text Annotation --OnTeA

Proceedings of the 2007 conference on Information Modelling and Knowledge Bases XVIII
Distance Metric Learning for Large Margin Nearest Neighbor Classification

The Journal of Machine Learning Research
Multitask Kernel-based Learning with Logic Constraints

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Iterative Annotation of Multi-relational Social Networks

ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Scalable semantic annotation of text using lexical and web resources

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications

Quantified Score

Hi-index	0.03

Visualization

Abstract

Tagging has become a popular way to improve the access to resources, especially in social networks and folksonomies. Most of the resource sharing tools allow a manual labeling of the available items by the community members. However, the manual approach can fail to provide a consistent tagging especially when the dimension of the vocabulary of the tags increases and, consequently, the users do not comply to a shared semantic knowledge. Hence, automatic tagging can provide an effective way to complete the manual added tags, especially for dynamic or very large collections of documents like the Web. However, when an automatic text tagger is trained over the tags inserted by the users, it may inherit the inconsistencies of the training data. In this paper, we propose a novel approach where a set of text categorizers, each associated to a tag in the vocabulary, are trained both from examples and a higher level abstract representation consisting of FOL clauses that describe semantic rules constraining the use of the corresponding tags. The FOL clauses are compiled into a set of equivalent continuous constraints, and the integration between logic and learning is implemented in a multi-task learning scheme. In particular, we exploit the kernel machine mathematical apparatus casting the problem as primal optimization of a function composed of the loss on the supervised examples, the regularization term, and a penalty term deriving from forcing the constraints resulting from the conversion of the logic knowledge. The experimental results show that the proposed approach provides a significant accuracy improvement on the tagging of bibtex entries.