Expanding domain-specific lexicons by term categorization

  • Authors:
  • Henri Avancini;Alberto Lavelli;Bernardo Magnini;Fabrizio Sebastiani;Roberto Zanoli

  • Affiliations:
  • ISISTAN-UNCPBA, 7000 Tandil, Argentina;ITC-irst, 38050 Trento, Italy;ITC-irst, 38050 Trento, Italy;ISTI-CNR, 56124 Pisa, Italy;ITC-irst, 38050 Trento, Italy

  • Venue:
  • Proceedings of the 2003 ACM symposium on Applied computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,..., cm} of domains, a lexicon Li1, boostrapping from an initial lexicon Li0 and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.