Expanding domain-specific lexicons by term categorization

Authors:
Henri Avancini;Alberto Lavelli;Bernardo Magnini;Fabrizio Sebastiani;Roberto Zanoli
Affiliations:
ISISTAN-UNCPBA, 7000 Tandil, Argentina;ITC-irst, 38050 Trento, Italy;ITC-irst, 38050 Trento, Italy;ISTI-CNR, 56124 Pisa, Italy;ITC-irst, 38050 Trento, Italy
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 5
Cited 6

Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An improved boosting algorithm and its application to text categorization

Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Distributional term representations: an experimental comparison

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Enhancing electronic dictionaries with an index based on associations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly Supervised Approaches for Ontology Population

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Automatic term categorization by extracting knowledge from the Web

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Deliberate word access: an intuition, a roadmap and some preliminary empirical results

International Journal of Speech Technology
Ontology extension and population: an approach for the pharmacotherapeutic domain

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,..., cm} of domains, a lexicon Li1, boostrapping from an initial lexicon Li0 and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.