Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Positioning unknown words in a thesaurus by using information extracted from a corpus
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic expansion of domain-specific lexicons by term categorization
ACM Transactions on Speech and Language Processing (TSLP)
Automatic term categorization by extracting knowledge from the Web
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Hi-index | 0.00 |
This paper proposes a system for automatically categorizing terms or lexical entities into a predefined set of semantic domains. We present an approach that exploits the knowledge available in the Web to create a model of each term or entity (Entity Context Lexicons - ECLs). Each profile is simply a list of terms (similar to the Bag-Of-Words representation in text categorization) and it is composed primarily by the words often appearing in the same contexts of the entity. These profiles model the contexts in which the entity usually appears and they can be subsequently processed by an automatic classifier. Moreover, we propose and validate a profile-based categorization model developed for this particular task which uses the ECLs of the training entities to build a profile for each class (Class- Context lexicon - CCL). Finally, we propose a technique for dealing with multi-label classification based on a decision module that exploits a neural network. We show the effectiveness of the proposed approach on a term categorization task using a standard benchmark composed of a set of domain-specific lexicons (WordNetDomains).