Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An improved boosting algorithm and its application to text categorization
Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Distributional term representations: an experimental comparison
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Enhancing electronic dictionaries with an index based on associations
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly Supervised Approaches for Ontology Population
Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Automatic term categorization by extracting knowledge from the Web
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Deliberate word access: an intuition, a roadmap and some preliminary empirical results
International Journal of Speech Technology
Ontology extension and population: an approach for the pharmacotherapeutic domain
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Hi-index | 0.00 |
We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,..., cm} of domains, a lexicon Li1, boostrapping from an initial lexicon Li0 and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.